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SUMMARY' 

In the uniqueness part of a geophysical inverse problem, the 
observer wants to predict all likely values of P unknown numerical 

properties z =(zj z P ) of the earth from measurement of D other 

numerical properties y (0) =(y f°\...,yi 0) ), using full or partial 
knowledge of the statistical distribution of the random errors in y (0) . 
The data space Y containing y (0) is D -dimensional, so when the 
model space X is infinite-dimensional the linear uniqueness prob- 
lem usually is insoluble without prior information about the correct 
earth model x. If that information is a quadratic bound on x (e.g., 
energy or dissipation rate), Bayesian inference (BI) and stochastic 
inversion (SI) inject spurious structure into x, implied by neither 
the data nor the quadratic bound. Confidence set inference (CSI) 
provides an alternative inversion technique free of this objection. 
The first step in CSI is to estimate unmodelled systematic errors in 
y- 0) and z. The second step is to choose any finite-dimensional sub- 
space X N of X , and to use the prior quadratic bound to estimate the 
truncation error when the full data function F :X —>Y in the for- 
ward problem is approximated by restricting it to X N to give a 
finite-dimensional function F N :X N — > Y. Step three calculates the 
eigenstructure (singular value decomposition) of Fn- Step 4 uses 
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this eigenstructure to find for each positive p < 1 a Neyman subset 
K z (p) of the P -dimensional prediction space Z such that either the 
correct value of the prediction vector z is a member of the 
confidence set K z (p) or an event has occurred whose probability 
was no more thanp. In contrast to SI and BI, CSI offers no incen- 
tive for considering any value of P except 1. CSI is illustrated in 
the problem of estimating the geomagnetic field B at the core- 
mantle boundary (CMB) from components of B measured on or 
above the earth’s surface. Neither the heat flow nor the energy 
bound is strong enough to permit estimation of B r at single points 
on the CMB, but the heat flow bound permits estimation of uniform 
averages of B r over discs on the CMB, and both bounds permit 
weighted disc-averages with continous weighting kernels. Both 
bounds also permit estimation of low-degree Gauss coefficients at 
the CMB. The heat flow bound resolves them up to degree 8 if the 
crustal field at satellite altitudes must be treated as a systematic 
error, but can resolve to degree 1 1 under the most favorable statisti- 
cal treatment of the crust. These two limits produce circles of con- 
fusion on the CMB with diameters of 25° and 19° respectively. 

Key words: confidence sets, inverse problems, core-mantle boun- 
dary field 
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1 INTRODUCTION 

In geophysical inverse problems, the data are finitely many real numbers y ! (0) , ...,y^ 0) measured 
with unknown observational errors 8 y P , .... 8 yp°\ The goal is to use the data to estimate proper- 
ties of the real earth. A common intermediate step is the existence problem, to find one physi- 
cally reasonable earth model x (0) which fits the data acceptably (i.e., within the random errors of 
observation and the systematic errors of modelling). Usually the model space X of all possible 
earth models x is infinite dimensional, so many physically reasonable models besides x (0) will fit 
the data acceptably. Thus besides the existence problem there is a uniqueness problem, to esti- 
mate the "error" in x (0) , i.e., to discover how much the real earth might deviate from x (0) . 

The uniqueness problem must be carefully formulated if it is to be amenable to quantitative 
solution. Collect the data into an observed data vector y (0) = (y P , .... y^ 0) ), and let Y be the data 
space, the vector space 0 inear space) of all possible D -dimensional data vectors. The theory of 
the forward problem provides a function F which assigns to each earth model x in X the value 
F (x) which y (0) would take if x were the correct earth model and there were no errors. The errors 
in y' 0) are of two types: the random error of observation, <5y = ( 8 y j, .... 8 y D )\ and the systematic 
errors = (77 j, ...,77 D ) due to the inadequacy of the model space X . If x is the correct model, then 

y (°) = F (x) + 5 y + 77 . (1.1a) 

The hope is to use the data y (0) to estimate properties of the real earth. The complete infor- 
mation content of an arbitrary infinite-dimensional model vector x can be neither registered in a 
finite computer nor comprehended by a finite observer, so in any real inverse problem the data 
y(°> will be used to estimate a finite number of properties of the earth. In the present paper it will 
be assumed that these properties can be described by a finite list of real numbers which together 
constitute a "prediction vector," 2 = (2 ],..., z F ), in the P -dimensional "prediction space" Z. The 
theory of the forward problem provides a function G which assigns to each model x in X the 
value G (x) that z would take if x were the correct model and there were no errors. Thus 
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z = G(x)+C (1.1b) 

where £ is the systematic error produced by inadequacies in the model space. Since (1.1b) is a 
prediction rather than an observation, it contains no random error 8z of measurement. That ran- 
dom error would enter only in a later attempt to verify the prediction by direct measurement, and 
it need not encumber the inverse problem. 

The uniqueness part of the inverse problem consists in using y (0) in (1.1a) to put limits on x, 
and transferring those limits to z via (1.1b). Of course this program is doomed unless estimates 
are available for the errors Sy, rj and £. By definition, a systematic error like 77 or £ is one about 
which only inequalities are known, while the random error 5y can be thought of as drawn at ran- 
dom from a population of error vectors in Y with a probability distribution p E on Y . 

Even if p E for 8y and inequalities for 77 and f are known, the uniqueness problem is usually 
insoluble for another reason, basically because dim Y < dimX , so (1.1) has more unknowns than 
equations. The difficulty is obvious in the linear case. Let F, and Gy be the real-valued functions 
onX defined by 

F(x) = (F,(x) F P (x)) (1.2a) 

and 

G(x) = (G,(x) Gp (x)) . (1.2b) 

If F and G are linear, so are the data functionals F ( - and the prediction functionals Gy. If each Gy 
is a linear combination of F\,...,F D , then estimates of z are possible (Backus & Gilbert, 1968, 
1970), but otherwise the set of z’s permitted by the data is unbounded (Backus, 1970a). 

When the prediction functionals are not linear combinations of the data functionals, the 
observer who w’ants to estimate (i.e., bound) z from (1.1b) must invoke more information about 
the earth than is contained in (1.1a). This "prior information" comes from previous knowledge 
about the earth in particular and about physics and chemistry in general. For example, in 
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attempting to invert seismic data to find the density p and seismic velocities V P and , the 
seismologist is quite sure that p , V P and V s arc positive even' where, and is rather confident of 
upper bounds for them based on laboratory measurements, the theory of condensed matter, and 
solar and astronomical observations which contribute to the discussion of the likely range of 
chemical compositions of the earth. Examples of prior information in modelling the geomagnetic 
field are that its total energy cannot have a rest mass greater than that of the earth (the energy 
bound), and that the minimum ohmic heating rate required to sustain it is probably less than the 
total rate of geothermal heat flow observed at the earth’s surface (the heat flow bound). 

The seismic prior information just mentioned can be stated in terms of "linear bounds." 
There are M linear functions /,-, i = 1, ...,M, which assign real numbers /,• (x) to all earth models 
x, and there are 2N real numbers a t < 6; such that the correct x is known to satisfy 

a,- </, (x) < b { (1.3) 

for i = 1, ...,Af . In the seismic example, the model x is the triple of functions p, V P , Vg, and 
f i (x) is the value of one ofp, V P or V s at a particular location r ; in the earth. Evidently M =«> 
in the seismic case. This is essential, because for M <°° the / i,...,/ w simply augment the 
number of data functionals from D to D + M , and if dim X = «= it is still true that 
D +M < dimX. When M = «, linear constraints are extremely useful in resolving nonunique- 
ness. Seismic examples are given by Wiggins et al. (1973), Garmany (1979), Orcutt (1980), 
Stark et al. (1986), Starit (1987, 1988), and Stark and Parker (1987). Examples from gravity 
inversion appear in Parker (1974, 1975), and examples from geomagnetic prospecting are given 
by Oldenburg (1983) and Huestis and Parker (1977). McNutt and Royden (1988) give an appli- 
cation to the geotherm. In all these cases, the numerical inversion technique is linear program- 
ming (Gass, 1985) but many of the problems are nonlinear and involve very' adroit use of the con- 
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The energy and heat flow bounds in the geomagnetic modelling problem have a mathemati- 
cal character different from (1.3). They are both of the form 

Q (x, x) < <7 (1.4a) 

where q is a known real number and Q is a known positive-definite quadratic form in the model 
x. That is, Q (x, x) is a homogeneous quadratic polynomial in x, and 

Q (x, x) > 0 (1.4b) 

for every nonzero model x. The bound (1.4) is a prior quadratic bound on the correct earth model 
x. A single prior quadratic bound on x always confines the prediction vector z to a bounded sub- 
set K z of the prediction space Z , whereas infinitely many linear bounds are needed to do so if 
dimAT=«>. Whether A ' 2 is small enough to be useful must usually be settled by numerical calcu- 
lation. One of the conclusions reached in this paper is that very lax prior quadratic bounds on x 
can produce surprisingly tight bounds on z if the data are adequate. The use of a prior quadratic 
bound was suggested by Backus (1970a), Jackson (1979) and Gubbins and Bloxham (1985). 

Jackson (1979) calls (1.3) and (1.4) "hard bounds” on the correct earth model x, as opposed 
to "soft bounds." A soft bound is a probability distribution p x on the model space X, which 
describes where in X the correct model x is likely to be. A prior soft bound p x can be subjective 
or objective. If subjective, it is the observer’s personal probability distribution for x, incorporat- 
ing all his or her knowledge about x except the data y (0) . Measuring y (0) increases the observer’s 
knowledge, and changes p x from a prior to a posterior personal probability distribution. An 
objective p x arises when there really are many different earth models, for example many different 
ocean-bottom magnetic anomaly patterns, all sampled by surface magnetometers, and some sam- 
pled by deep-towed magnetometers. The deep-towed samples constitute an objective data base 
from which p x can be estimated by standard statistical techniques. This p x can then be used to 
interpret a surface record from a new area. 
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The two currently popular methods for incorporating prior information into the inverse 
problem (1.1) are stochastic inversion (SI) and Bayesian inference (BI). Both methods use soft 
prior information, a probability distribution p x for models x in X. Stochastic inversion is essen- 
tially minimum variance estimation, while Bayesian inference uses Bayes’ theorem to combine 
p x with y (0) and the error distribution p E to produce a new probability distribution p x for x in X. 
Both SI and BI can be applied to either a subjective or an objective prior p x . When p x and p E 
are Gaussian and (1.1) is linear, BI and SI produce the same results. A review of the two methods 
and a bibliography appears in Backus (1988a). Tarantola’s (1987) book about BI appeared after 
that bibliography was assembled. 

In the geophysical literature it has become common practice to apply BI and SI to hard prior 
information like (1.3) and (1.4). This requires that first the hard information be "softened" to a 
prior probability distribution p x . Backus (1970c, 1988a), Jackson (1979) and Gubbins and Blox- 
ham (1985) discuss the softening process. Backus (1988b) recently showed that when 
dimX = eo, softening the hard quadratic bound (1.4) to a probability distribution inevitably adds 
spurious information about the correct model x. This new information is subtle, and can easily 
escape the observer’s attention. It is definitely not implied by the original inequality (1.4), and is 
of a character likely to be unacceptable to most observers. For example, every p x which adds no 
other information to (1.4) will assign prior probability zero to the set of models x for which 
Q (x, x) is finite. When dim X = «> and the prior information is a quadratic bound (1.4), neither SI 
nor BI is a suitable technique for resolving nonuniqueness in the inverse problem (1.1). 

The present paper develops Neyman’s (1937) theory of confidence sets as a replacement for 
BI and SI when the prior information is a hard quadratic bound (1.4). The general idea of 
confidence set inference (CSI) is described by Backus (1987) and Stark (1988), and in fact is a 
special case of a well-established scheme for statistical inference, as will be shown in sections 3 
and 4 below. With CSI, the inevitable and often very large uncertainty in the value of q in (1.4a) 
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cannot be dealt with by "softening" (1.4a) to a probability distribution on X. Therefore, conclu- 
sions drawn with CSI must be tested for sensitivity to q over a range of values. In the geomag- 
netic problem, it will be seen that altering q by several factors of 10 docs not appreciably affect 
the conclusions. 

The present paper advocates replacing BI and SI by CSI only in certain circumstances. If 
the prior information really is a probability distribution p x for x on A' , and the observer has 
confidence in it, then BI and SI are preferable to CSI. To use CSI on p x , the observer must con- 
vert px to a quadratic inequality (hardening the soft information, in the terminology of Jackson, 
1979). When dim A' = °°, this hardening process discards large amounts of prior information 
(Backus, 1988b). 

In the existence problem as well, SI and BI are unobjectionable techniques. When 
dimX » dim Y, computational schemes for constructing an x (0) which acceptably fits the data 
are prone to numerical instability because so many models are acceptable. Tikhonov (1963) and 
Tikhonov and Arsenin (1972) suggested a general remedy now called regularization: add another 
condition on x besides (1.1) which will make x (0) unique, so the computer does not go into hunt- 
ing oscillation or wander off to infinity in the high wave-number part of the model space. The 
oldest regularization scheme is simple truncation of the model space to an N -dimensional sub- 
space X N , from which an x (0) is selected by a least-squares fit to the data. If the fit is unsatisfac- 
tory', N is increased. Tilchonov considered other regularization schemes, most of which seek the 
x that acceptably fits the data and minimizes some norm on X . Geophysical examples occur in 
Backus and Gilbert (1967) and Shure et al. (1982). Both BI and SI offer such techniques, and 
when so used they solve the existence part of the inverse problem but not the uniqueness part. 
This distinction is not always made clear in the literature. 

One of the most convincing examples of prior quadratic information in the form (1.4) arises 
in trying to estimate the geomagnetic field B at the core-mantle boundary (CMB) from 
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measurements of B made on and above the surface of the earth. Therefore, the geomagnetic 
problem will be used to illustrate the general theory of CSI. The next section describes the 
geomagnetic problem in detail, and uses it to clarify some general issues in inversion. The 
remainder of the paper is a description of CSI, followed by its detailed application to the geomag- 
netic problem. 


2 THE GEOMAGNETIC INVERSE PROBLEM 

In the geomagnetic inverse problem used here to illustrate CSI, the model space X is the linear 
space of all possible geomagnetic fields B produced outside the core-mantle boundary (CMB) by 
electric currents inside the core. Every such B is defined only in the region from the CMB to 
infinity, and can be written there as 

B = -Vyz (2.1a) 

where 

~ / 

y(r) = a £ (air ) /+1 £ fifta ) *7" (0 • (2. lb) 

1=1 m=-l 

Here a is the radius of the CMB, r is the position vector measured from the center of the earth, r 
is | r | , f is r/r, and {Yr 1 , .... Yf) is any orthogonal basis for the spherical harmonics of degree /, 
normalized so that \Y t m \ 2 averages to (2/+1) -1 on the surface of the unit sphere. The/3/"(a) are 
the internal Gauss coefficients of B at the CMB. They can be used to parameterize the model 
space X . Alternatively, X can be parameterized by any parameterization of the set of scalar- 
valued functions on the CMB, because B outside the core is completely determined by the radial 
component B r on the CMB (Backus, 1986, gives several parametrizations). In this formulation 
of the inverse problem, the true values of B r on and above the CMB will include unmodelled 
contributions from sources outside the core. These are treated as errors in the data. 


September 19, 1988 


George E. Backus 


Confidence Set Inference 10 


In this geomagnetic inverse problem, the data arc 

y,- = v, • B(r,) (2.2) 

where, for i = v, is a unit vector and r, is a position on or above the surface of the earth. 

Equations (2.1) and (2.2) describe how to calculate the data produced by any model B if the 
errors vanish. Therefore, (2.1) and (2.2) give the forward data function F in (1.1a). This F is 
linear, but it would be nonlinear if some of the data were intensities or angles rather than Carte- 
sian components of B. 

For any i among 1 if v ( - is the unit vector appearing in (2.2) then the y, (0) of (1.1a) is 
the measured v ( - component of the magnetic field at r, . Thus y/ 0) includes the instrument errors, 
position errors, and the field produced by all sources outside the core (since (2.1) models only 
internal sources). These are magnetization in the crust and the satellite, and electric currents in 
the mantle, crust, ocean, ionosphere, satellite, and magnetosphere. The observer has wide lati- 
tude to apportion these contributions to y (0) among the three terms in (1.1a): F(x), Sy and J]. For 
example, the instrument and navigation errors are likely to be treated as part of the random error 
Sy. The contribution to y (0) from crustal magnetization might be unknown, but an upper bound 
on its magnitude might be available: then it would belong in the systematic error rj . Alterna- 
tively, perhaps the crustal magnetization is well-described by a two-dimensional random process 
on the surface of the earth. Then its contribution to y (0) could be included in Sy. This statistical 
hypothesis about p E , the probability distribution for Sy, could be tested by the methods described 
in section 3. Finally, if a representation for B more general than (2.1) were adopted (Backus, 
1986), any one of the sources of B could be included as part of the earth model x, and its contri- 
bution to B computed as part of F (x). 

The predictions z ; in (1.1b) might be the 224 Gauss coefficients of degree less than 15, or 
the values of B r at 300 points on or above the CMB, or the flux of B through certain null-flux 
curves on the CMB (Backus, 1968; Gubbins and Bloxham, 1985). In the first two cases, G in 
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(1.1b) is linear, and in the first case £ = 0 in (1.1b) because the prediction can be made exactly 
once the model is known. In the last two cases, £ * 0 because the prediction includes a contribu- 
tion from magnetic fields with sources outside the core. 

In the geomagnetic example, prior inequalities (1.4) can be obtained for both energy and 
dissipation rate. Einstein’s relativity requires that the rest mass of the energy in B be part of the 
total mass of the earth, as measured by surface gravity and the Cavendish experiment for 
Newton’s gravitational constant. In the notation of (2.1), it follows that (Backus, 1988a) 

©o / 

X (2/+1X/+1)- 1 X \PT( a ) 1 2 ^ 2 x 1 0 33 nT 2 (2.3) 

if the /?/"(«) arc measured in nanoTesla. Alternatively, the electrical conductivity of the core 
probably is known to within a factor of 10, and the total geothermal heat flux is known with 
somewhat better accuracy (Stacey, 1977). If ohmic dissipation in the core does not exceed the 
geothermal heat flux then (Backus, 1988a, based on Parker, 1972, and Gubbins, 1975) 

oo l 

X (/+l)(2/+l)(2/+3)r 1 X \PP(a ) | 2 ^ 3 x 10 17 nT 2 . (2.4) 

/ =1 m =>-/ 

Since the gauss coefficients are probably of the order of 10 5 nT at the CMB, bounds like (2.3) and 
(2.4) appear at first sight to be unhelpful. It is one of the counterintuitive properties of BI on 
model spaces of high dimension that geographically well-distributed measurements of B can use 
a probabilistic (softened) version of (2.4) to find l 3 °(<a ) from satellite data correct within about 
one part in 10 5 , and to find q^^a) within five percent (R.A. Langel and R.H. Estes, private com- 
munication). Whether this accuracy survives the replacement of BI by CSI will be answered by 
future computations, to be reported elsewhere. 

Other quadratic forms besides (2.3) and (2.4) have been considered in geomagnetic model- 
ling (see, e.g., Shure et al., 1982), but all have been isotropic, i.e., invariant under rotations about 
the center of the earth, and so have had expressions of the form 
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£C(/) £ \Pr(o)\ 2 <q (2-5) 

;=1 m=— / 

for various choices of C (/ ). 


3 CONFIDENCE SET INFERENCE IN GENERAL 

Confidence set inference (CSI) is an extension of the parametric method of statistical inference 
originally proposed by Neyman (1937). As a formal procedure, it applies to all inverse problems, 
linear or not. Useful results are extracted in particular problems by exploiting their special 
characteristics, such as linearity or the availability of prior information. The following discussion 
of CSI draws heavily on Chapter 34 of Cramer (1946). The necessary measure- and set-theoretic 
notations are listed in Appendix A. 

In CSI, the observer has measured a data vector y (0) = (y j (0) » — .yi 0) ) in the D -dimensional 
data space Y -R° . ( R is the real line and R° is the space of the real lxD matrices.) The 
observer knows that 

y (0) = y(°) + 77 (3.1a) 

where v^ and tj are two vectors about which only the following partial information is available: 
there is a given (usually small) subset S Y of Y such that 

rj e S Y ; (3.1b) 

and J’ (0) was drawn at random from a population whose probability distribution on Y is py. This 
p Y is not known, but it is known to belong to a given m -parameter family of probability distribu- 
tions on Y. The parameters, 6 A , will be collected into a parameter vector 

6 (9 6 A ) (3.1c) 

in the m -dimensional space Q=R m of parameter vectors. (":=" means "is defined as.") The pro- 
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bability distribution whose parameter vector is 6 will be written p(0, •). All the p(0, •). for all 
0 6 0, are assumed to have the same domain a particular cr-ring of subsets of Y. If Q e 
then the probability assigned to Q by p(9, •) will be written p(6,Q). There is no restriction on 
how thep(0, •) depend on0. They could all be Gaussian, with means and variance matrices in Y 
specified as functions of0; or they could be spline fits to an empirical distribution obtained from 
the data (Constable and Parker, 1988). Neyman (1937) requires that the function 6\->p(§, •) be 
injective, but no such demand need be made here; p{6[, )-p{0 2 , •) will not imply =0 2 - In the 
formal development of CSI, m in (3.1c) is unrestricted and can even be infinite. In practical 
applications, usually 

m « dim Y , (3. Id) 

a restriction which permits use of the observed data vector y (0) to test the hypothesis that py 
really does belong to the family p (6>, •) (Kendall and Stuart, 1979, Ch. 30). 

The observer’s goal is to use y (0) to predict P numbers z j, .... z P . Let z := (z j, ..., z P ) be the 
"prediction vector," a member of the prediction space Z=R P . Nothing is known about z except 
that 

z = G(# 0 ) + C (3.1e) 

where G : »Z is a known function, 6 0 is unknown but is known to be one of the (possibly 

many) parameter vectors 6 such that 

Py=p(e, •), (3. If) 

and C is an unknown vector about which no information is available except that 

C € S z (3.1g) 

where S z is a given subset of Z. The vectors 77 in (3.1a) and £ in (3.1e) can be thought of as unk- 
nown systematic errors. 
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The CSI solution to the problem (3.1) is to look for a subset K z qZ, which is small in 
some useful sense (to be discussed later) and which has a high probability of containing the true 
prediction vector z. This idea of probability requires careful examination because it is not quite 
the usual one axiomatized by Kolmogorov (1950). Unlike Bayesian inference, CSI assigns pro- 
babilities to statements rather than to subsets of Z, and the number assigned to a statement is not 
unique. Typically, CSI produces a statement Z, a number p e (0, 1), and a proof that either I is 
true or some event E has occurred whose probability wasp. If p « 1, the observer usually will 
feel safe in accepting I as true. The value of p is the "failure rate” for I, and 1 -p is the 
"confidence level" for I. The statement E is said to hold with failure ratep or at confidence level 
1 -p . An observer who always accepts statements when they have failure ratep will be wrong, in 
the long run, in a fraction of cases approximately no more than p . The nonuniqueness arises 
because if I is true with failure rate p , then obviously E is true with every larger failure rate. 
Ideally, one would like to calculate the greatest lower bound of the failure rates for Z, but this is 
often a very difficult calculation. 

The calculus of failure rates is somewhat different from the ordinary calculus of probability. 
To illustrate this, suppose that Zj and In are statements with failure rates p j andp 2 . Let £, be the 
event with probability p t - which must have occurred if Z,- is false. Letp t a p 2 denote the smaller 
of pi and p 2 . If Ei implies Ej, then failure of Z 2 means that I] is false, so £ : has occurred. 
Therefore Z 2 has failure ratepi. Consequently, if Ei implies Z^ then Z 2 has failure ratepi Ap 2 
as well as rate p 2 . The statement "Ei or Z 2 " is false whenever both Ei and Z 2 are false. (Here 
"or" has its technical, logical meaning.) Then both E { and £ 2 must have occurred. The probabil- 
ity of this compound event could be as high aspj Ap 2 , but not higher, so "Ei or E/ holds with 
failure rate <p x Ap 2 . The statement "Ej and Ej" is false whenever one of Ei or E^. fails, i.e., 
whenever one of £1 or £ 2 occurs. The probability of this compound event is no greater than 
p 1 +p 2 , so "E, and Z 2 " holds with failure rate <p x +p 2 . If Ei is certainly true, thenp 1 =0, and as 
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special cases of the above calculations, "X, or 1^' holds with failure rate 0 (in fact it is certain), 
"Si and I 2 " holds with failure ratep 2 , and if Ij implies 1^ then L* holds with failure rate 0 . In 
fact, if 'Ll implies In and I] is certain, then obviously so is In. 

The goal of CSI is to construct for each p e (0, 1] a set K z (p)zZ, and to prove that ifz is 
the true prediction vector then 

ze K z (p) (3.2) 

with failure rate <p. It is desirable that the set K z (p), called a confidence set for z, be as small 
as possible, in a sense to be discussed later. The proof that (3.2) holds with failure rate <p will 
make heavy use of the rules of inference described in the preceding paragraph. 

Given p e (0, 1 J, the observer begins the construction of K z (p) by choosing for each 6 e © 
a set K y (p,0)e L Y such that 

p(0,K Y (p,0))>l-p . (3.3a) 

For each Qe 0, the set K y (p ,6) can be chosen in any way whatever, subject only to (3.3a). 

Now the 6 0 which appears in (3.1e) solves (3. If), so p (0 O , •) is the true probability distribu- 
tion p Y . Therefore the probability that y (0) e K y (p,6 0 ) is at least 1 -p , so the statement 

y (0) e K y (j>,e 0 ) (3.3b) 

holds with failure rate <p . By hypothesis, 

77 g . (3.3c) 

Since (3.3b) has failure rate <p and (3.3c) has failure rate 0, the conjunctive statement "(3.3b) 
and (3.3c) are both true" has failure rate <p +0=p. But the truth of this conjunctive statement, 
together with ( 3 . 1 a), implies 

>- (0) e K y (p,6 0 ) + S y . (3.3d) 

Therefore (3.3d) has failure rate <p. The offending event E with probability <p which must 
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occur if (3.3d) fails is the event that (3.3b) is false; i.e., 

y (0) e Y \K Y (p,9 0 ). < 3 - 3e > 

Next, for each y e Y the observer defines a set, namely 

K^(p,y) •= {9 .0 e © and y e K Y (p,0) + S Y ) . (3.30 

Clearly ic\p, y)c ©. Furthermore, 6 e K e (p, y) iff (if and only if) y e K Y (p,9)+S Y . There- 
fore (3.3d) is true iff 

6 0 eK%, y (0) ), ( 3 -3g) 

so (3.3g) has failure rate <p . Finally, the observer defines the set 

K z (p):=G (K & ( p , y (0) )) + S z . < 3 - 3h ) 

Statements (3.1e) and (3.1g) are certain and (3.3g) has failure rate <p. If K z (p) is defined by 
(3.3h), then (3.2) is a consequence of the simultaneous validity of (3.1e), (3.1g) and (3.3g). 
Therefore, by the failure rate rules of inference, when AT z (p) is defined by (3.3h) then (3.2) holds 
with failure rate <p. The offending event with probability <p which must occur if (3.2) fails is 
(3.3e). 

The observer has great freedom in choosing K Y (p,6), the only restriction being (3.3a). 
This freedom can be exploited to make the set K z {p) in (3.3h) as small as possible. But what 
does "smalT mean for a subset of Z if dim Z > 1? No length or volume has been defined on Z , 
and "small” may mean different things for different components z,- , since those components may 
be measured in different physical units. If dim Z = 1 , the "size” K z of a subset K z c Z is easy to 
define because Z is simply the real line R . A convenient definition is 

| K z | := l A (sup K z - inf K z ) (3.4a) 

w'here sup K z is the supremum or least upper bound of K z and inf K z is the infimum or greatest 
lower bound oiK z . The quantity 2 \K Z \ is called the "diameter” of K z because 
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2\K Z | =sup { |«-w| : u,w e K z ) . (3.4b) 

Therefore \K Z | might be called the "radius" of K z . Definition (3.4b) could be extended to the 
case dimZ >1 if | u | were defined in that case, but it is not. A useful consequence of (3.4a) is 
that if K z and K z are any subsets of R then 

l*f +K% | = |K? | + \K\ | , (3.4c) 

with equality, not merely inequality. 

Bayesian inference and stochastic inversion lead naturally to prediction spaces Z with 
dimZ >1, but in confidence set inference it is both possible and desirable to consider only 
dimZ = 1. The goal of any inversion of data is numerical prediction. The point of considering 
the predictions z i ,..., z P together as a single prediction vector z = (z z P ) is that for any func- 

tion h : Z — > R a confidence set for the numerical prediction h (z) is obtained easily from the 
confidence set for z. If ze K z (p) with failure ratep, then clearly h(z) e h (K z (p)) with failure 
rate p. But trying to treat all functions h'.Z->R simultaneously has a cost. It forces a 
compromise in choosing the K r (p,6 ) for (3.3a): none of the confidence set radii | h (K z (p)) \ 
should be immoderately large, whatever the function h \Z-*R. Clearly, for any particular h, 
the smallest value of \h(K z (p))l is achieved by ignoring other functions h ’ when choosing 
K r (p,6) to satisfy (3.3a). For any particular prediction h (z), K Y (p, 6) should be tailored to the 
particular function h : Z -*R. But then the observer is simply replacing the function G :Q—>Z 
in (3 . 1 e) by the function g :'&->R, where f =h o G , the composite of h with G . Thus in CSI 
the observer is really dealing with the case dimZ = 1 in (3.1). This point of view has been made 
practical only by the advent of inexpensive and powerful computers. Before them, economy 
required that the data be "reduced" to a multidimensional model and its error estimates. All indi- 
vidual numerical predictions were then computed from the model. 
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There may remain cases where an observer really needs a multidimensional prediction for 
itself rather than as an intermediate step toward various one-dimensional predictions. The present 
paper does not discuss these cases. Henceforth, it will be assumed that dimZ = 3, so Z =/?, and 
the function G :©—»/? in (3.1c) will be written g :©—»/?, while z will be written z and £ will 
be written £. Since the codomain of g is R , g is a functional, the prediction functional. 
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4 CONFIDENCE SET INFERENCE IN GEOPHYSICAL INVERSE PROBLEMS 

Most geophysical inverse problems, even nonlinear ones, permit considerable simplification of 
the general formalism for CSI given in section 3. To see this, it is necessary to state the typical 
geophysical inverse problem at an almost pedantic level of precision, as follows. 

The observer has measured the components of the data vector y (0) =(y{°K—, y^) in the 
data space Y=R° , and wants to use y (0) to predict the value z of a certain numerical property of 
the earth. As discussed at the end of section 3, the prediction is only one real number rather than 
several. The tools available to the observer are these: (1) an arbitrary set X called the model 
space, whose members x will be called earth models: (2) a function F :X ->Y called the data 
function; (3) a functional g : X -4 R called the prediction functional: (4) a known subset S Y of Y ; 
(5) a known subset of S z of R ; (6) an m -parameter family of probability distributions on Y. The 
case m =0 is permitted, and then the "family” will consist of a single known probability distribu- 
tion p on Y. If m >0, the m parameters will be collected into a single parameter vector 
0 in the parameter space Q=R m , and the probability distribution with parameter 

vector 6 will be written p (0, •). The probability which p (6, •) assigns to the subset Q c Y will be 
written p(6,Q). All the p (6, •) must have the same domain Zy , a c-ring of subsets of Y. More- 
over, Zy must be closed under various operations of Euclidean geometry'. Rather than list these 
in detail, the present paper assumes simply that Zy consists of all subsets of Y to which a 
Euclidean volume can be assigned, i.e., the Lebesgue-measurable sets (Halmos, 1950), so Zy will 
include all open and all closed subsets of Y . 

The information available to the observer is the following: a vector 8y was drawn at random 
from a population in Y whose probability distribution is p E . This p E may be unknown but is 
known to belong to the given m -parameter family. That is, there is at least one 6 e © such that 

P£=P(0.O- (4.1a) 

The observer also knows that there are vectors xeX, 7] eY and a real number £ such that 

n e s Y , (4.ib) 
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CeS z , (4.1c) 

yW = F(x)+T] +Sy , (4. Id) 

and 

z=g(x)+C- ( 4 - le ) 

In (4.1), x will be called a correct earth model, 77 will be called the systematic error in modelling 
the data with x, £ will be called the systematic error in modelling the prediction with x, and Sy 
will be called the random error of observation. 

The geophysical inverse problem (4.1) is a special case of the statistical inference problem 
(3.1). To see this, introduce the following definitions: 


6:=GxX (4.2a) 

e := (0,X) (4.2b) 

y® := F (x) + 8y (4.2c) 

g(6) := g(x) (4 . 2 d) 

and, for every Q e and every (6,x) e ©, 

p(e,Q):=p(d,Q-F(x )). (4.2e) 


To use this correspondence between (4.1) and (3.1), the "parameter vector" 6 must be permitted 
to be a member of an arbitrary set ©, not necessarily an m -tuple like (3.1c). The assumption 
(3.1c) was never used in section 3, so abandoning it does no harm, and now (4.2b) makes sense. 
In practice, X is usually a linear space or a manifold of some dimension N , and then 6 in (4.2b) is 
exactly of the form (3.1c) with m —m+N. In this special case, the desirable but not essential 
condition (3. Id) becomes 

m + dim X « dim Y . (4.3) 

The formal application of CSI to (4.1) requires neither (3.1c) nor (4.3), but (4.3) does permit tests 
of the statistical hypothesis (4.1a), that Sy was drawn at random from a population described by 
some member of the family p ( 0 , •)• 
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In principle, the family p (6, •) can also depend on x, but the simpler problem (4.1) covers 
most geophysical inversions. In that simpler problem, some of the flexibility available in choos- 
ing a K Y (p,9) is not very useful, and can be sacrificed to the goal of economy. To construct the 
special kind of K r (p,6) appropriate to (4.1), for each Be © choose a set K r (p,B)e such 


that 

p(B,K Y (p,0))Z\-p (4.4a) 

and define (in the notation of A.3d) 

K Y (p,0) := F(x) + K r (p,6) (4.4b) 

where 6 = (9 , x) as in (4.2b). Then (3.30 becomes 

'= {(0,x) .6 e © and xeX and y e F(x) + K Y (p,0) + S Y } . (4.4c) 

Now, since dimZ = 1, (3.3h) is written 

K z (j>) = gtf%,y {0} )) + S z (4.4d) 

and, at confidence level > 1 -p or with failure rate <p 

z eK z (p). (4.4e) 


The description of K z (p) can be simplified further. For any subset W c © define Fl x (W ) 
as 

n^Oi 7 ) := {x : x e X and there is at least one 6 e © such that (0,x) e W } . (4.5a) 

Heuristically, n x (M~) can be thought of as the shadow of W cast on X by light rays parallel to 0. 
According to (4.2d), for every subset W c ©, 

|(W) = g(n x (W)). (4.5b) 

Define K x (p , y) c X as 

K x (p,y) := Tl x (K% , y)) . (4.5c) 

Then the various definitions imply that 
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K x (p, y) = {x : x e X and there is at least one 0 e 0 suchthat y e F (x) + K r (p,6) + S Y } . 

( 4 . 56 ) 

Furthermore, from (4.5b) and (4.5c), 

g (K% , y (0 >)) = g (K x (p , y (0 >)) (4.5e) 

so finally 

K z (p) = g(K x (p,y {n> ))+S z (4.50 

and, with failure rate <p , 

2 e K z (p) , (4.5g) 

The observer still has great freedom in constructing the sets K Y (p,6) to satisfy (4.4a), and 
this freedom can be used to minimize | g (K x (p , y^)) | . By (4.50 an d (3.4c), the result will be to 
minimize | A' z (p) | . To achieve the minimum possible value of | g (K x (p , y (0j )) | can require 
intricate statistical theory; the present paper will try to make | g (K x (p , y (0) )) | small, but not 
always as small as possible. 

5 LINEAR INVERSE PROBLEMS WITH KNOWN ERROR STATISTICS AND AN 
INJECTIVE DATA FUNCTION 

The solution (4.5) to the inverse problem (4.1) is that z e K z (p) at confidence level > 1 -p, or 
with failure rate <p, where K z (p) is given by (4.5f)- This "solution" is incomplete, because no 
particular choice has been made for the sets K Y (p,6) in (4.4a). The present section shows one 
way to choose the sets K r (p,6) and thus complete (4.5f) to an explicit solution, in what is 
perhaps the simplest inverse problem of all, the linear problem with known random error statis- 
tics and an injective data function. 

In this simplest version of problem (4.1), X is a linear space, F : X Y is a linear injection, 
g :X ->R is a linear functional, and the observer knows p£, the probability distribution in Y for 
the random error vector Sy which appears in (4. Id). 
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As noted in appendix B, p E induces a unique dot product on Y , and this makes Y a finite- 
dimensional Hilbert space. Therefore p E can be discussed in the language of tensors on Hilbert 
spaces, as set out in appendix C. The expected value of <5y, £[<5y], is known. The observer can 
redefine y (0) as y (0) -£ [<5y], thus achieving the result 

£[<5y] = 0. (5.1a) 

Then the variance tensor of p E is simply £ [(<5y)(5y)], and in terms of the dot product on Y deter- 
mined by p E , 

£[(<5y)(5y)]=I r (5.1b) 

where Iy is the identity tensor on Y . 

Let Pfpc) and Q E (x) be the orthogonal projectors of Y onto F(X) and its orthogonal com- 


plement, FQC) 1 . Then 

P EQC) (F(x))=F(x) (5.2a) 

and 

Qfq rj CF(x)) = 0 (5.2b) 

for ever}' xe X . Applying Pfqc) and Qfqc) t0 (4- Id) gives 

p f (x )( y < 0 ) ) =f(x) + Pf<x)(v ) + p F qc)(8. y ) ( 5 - 2c ) 

Qf (x ) Cv (0) ) = Qfqc)( t 1) + Qf(x )(<5y) • (5 .2d) 


Now £ :X — > Y is an injection, and thus so is (£|X) :X — »£(X). But F\X is also a surjec- 
tion, and hence is a bijection. Therefore, (£|X) _1 :F(X)^X exists. For brevity, this inverse 
function will be written simply F~ l , so 

F~':F(X)^X. (5.3a) 

Note that £ -1 (y) is defined only for ye £(X), not for all ye Y. Therefore, £ -1 could not be 
applied to (4. Id). But (4. Id) has been split into the two pieces (5.2c) and (5.2d), and £ -1 can be 
applied to (5.2c) to give 
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x = x (0 > - F" 1 o P F qc)(TI ) — F- ] o P F(X) (8y) (5.3b) 

where 

x (°) = F- , o P F(X) (y (0y ) (5.3c) 

and F _1 o P F qc) is the composite (A.la) of F -1 with P F (xy Substituting (5.3b) in (4.1e) gives 
z=: (0) + ^-^or 1 o P F qc)(V)~8 o F _1 o P FQ !)(5y) (5.3d) 

where 

2 (0) = g (x (0) ) = g o F" 1 o P F(X) (y i0) ) . (5.3e) 


In this simplest inverse problem, (5.3d) contains the germ of the solution (4.5e,f)- Since p E 
is known, there is no parameter vector 0 in (4.1a), and m, the number of parameters, is zero. 
Then in (4.4a), K r (p,6 ) does not depend on any 0, so (4.4a) becomes the requirement that a set 
K r (p) q Y be chosen so that 

p E (K Y (p))>\-p . (5.4) 

To motivate the particular choice of K r (p ) to be suggested here, consider the linear functional 
g o F -1 o P F (X)‘- r ->R. Since Y is a finite-dimensional Hilbert space, so is its subspace F(X). 
Therefore there is a unique vector ye F(X), such that for all ye F(X), g oF _1 (y)=yoy. But 
then, for all ye Y, 

8 o F _1 o P F( xfo) =7 ay (5.5a) 

because g o F" 1 o F F( y ) (y)=ynF F(X )(j , ) = f > r(X)(7)Dy=yny. It is also true that 

II* o F- 1 !! = ||y|| (5.5b) 

where ||yl| := (y cy) w and ||g o F -1 |i is defined as in appendix C. Hence, ify :=y/||y)|, then 

y=|lgoF" 1 ||y (5.5c) 

and 

llfll = 1 • (5.5d) 
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Then 

g o F' 1 o )’) = II £ o F J ||(y □ $y) • (5.5e) 

Now consider the real random variable y □ Sy. Its probability distribution, p-, is the margi- 
nal distribution of p E on the one-dimensional subspace of Y spanned by y. If V is any 
Lebesgue-measurable subset of the real line then 

Pf(V) =p^(vf + {r} x ) - < 5 - 6a ) 

The mean and variance ofyo5y are 

£[yc<5y] = 0 (5.6b) 

and 

E [(y o 5y) 2 ] = 1 , (5.6c) 

because £[fo<5y] = y □£[<%'] = yoO = 0, and £[(yo<5y) 2 ] = E[(ya8y)(5yay)] = 
£[yn(5y)(5y)oy] =yo£[(5y)(<5y)]cy =yoIy ay = yoy= 1. 

To construct a K r (p) in (4.4), use the notation of appendix A for open and half-open inter- 
vals. For any v e [0,°°), define 

p(y,v)=/?-([v,oo))+p-((-oo,-v']) . (5.7a) 

For simplicity, suppose that p-(V)- 0, whenever V contains only one point, and that p~(V ) > 0 
whenever V is an open interval. Then 

p(f,0) = 1 , (5-7b) 

and as v increases from 0 to °°,p (y,v) decreases monotonically from 1 to 0. Therefore the func- 
tion p(y, •) : [0, «>) — » (0, 1] has an inverse function, p(y, -) _1 : (0, 1] — > [0, «»), which will be writ- 
ten v(y, -):(0, l]-»[0, °°). The probability is p that \yuSy I >v(y,p). This motivates the 
definition 

K r (p) := {y:y € Y and |ypy| <v(y,p)} . (5.7c) 

Then 5y e K Y (p ) with probability p . Therefore 
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if □ <5y| <v(f,p) (5-7d) 

at confidence level 1 -p, or with failure ratep. 

Now for any positive a e R , let 

B z (a) := {2 : z e R and |z|<a). (5.8a) 

Clearly 

\B z (a)\=a (5.8b) 

in the sense of (3.4a). Also, from (5.5d) and (5.7d), 

g o F~ x o P FQn (5y) e B z (\\go F _1 ||v(f ,p)) (5.8c) 

with failure ratep. Therefore, from (5.3d), 

z e K z (fi) < 5 - 8d ) 

with failure ratep, where 

K z (p) = z (0) + S z +g o F~ l o P F $ ) (S Y ) + B z (\\g o/ r_1 ||v(f,p)) . (5.8e) 


Often, S z and S r , and hence g oF~ l oP FQr) (S Y ), will be s>Tnmetric about the origin; that is, 
S z =-S z and S Y =-S Y . In this case, the notation (3.4a) provides a simple way to restate 
(5.8d,e), namely 

|z-z (0) | < |S Z | + \goF-'oP FQC) (S Y )\ +\\goF- l \\v(y,p) (5.8f) 

with failure ratep. 

Of the three terms on the right in (5.8f), | S z \ will be easy to evaluate because a symmetric 
S z is usually defined simply by giving \S Z |. For the second term on the right of (5.8f) it is 
always true that 

|^or 1 o% ) (S 1 ')| <||goF- 1 || |5 r | . (5.9a) 

where 

|5 r | := sup {|| y || :ye S Y ) . 
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The inequality in (5.9a) is equality if S Y is a ball, i.e., if there is a real p such that S Y = B Y ((3) 
where 

B y (P) := [ti :ri e Y and ||7j|| <f3] . (5.9b) 

However, (5.9a) will not be the best estimate if, as often happens, S Y is a parallelipipcd. In this 
case there is a/J = (/Jj, ...,p D )e Y such that S Y =C r (p) where 

C Y (P ) := {77 : 77 e Y and \rj t | £ |/? ( I for i = 1 D ) . (5.9c) 

If S r =C Y (P) but the <5y, are correlated, then the edges of S Y are not mutually orthogonal in Y, 
and calculating | g o F~ x o Pfqc)($y) I becomes a problem in linear programming on a graph with 
2° vertices in N dimensions, where N = dim X . 

It remains to discuss the last term on the right in (5.8f). The easiest way to evaluate 
||g o F“ 1 || is to use the identity (5.5b). To evaluate v(y,p), one must do the integrals with respect 
to dpE in Y required to evaluate p- in (5.6a), and then one must carry out the integrals over the 
real intervals in (5.7a), to evaluate p(y,v). The inverse function of p(f, •) is v(y, •). In one spe- 
cial case, all these calculations can be done almost in closed form. If p E is Gaussian, then p * is a 
Gaussian distribution on R with mean 0 and variance 1. There is only one such one-dimensional 
Gaussian, so p- does not depend on y, and (5.7a) becomes 

oo 

p(v) = (2/7t)'*jdZe-' / f\ (5.10) 

V 

The inverse function, v(p), is tabulated; a partial list of its values appears in Table 1. Rational 
approximations are given by Abramowitz and Stegun (1964, p. 298). 

+-H-++++-H-+Table 1 near here+-H-++++++++ 

The solution (5.8d,e) or (5.8f) has a simple interpretation in terms of least-squares fitting. If 
V is any subspace of Y, and ye T, then the projection P v { y) is that \ eV which minimizes 
||y-v|| (Halmos, 1958). This is the ve V which provides the best fit to y in the sense of 
weighted least squares, the weight matrix being the inverse of the variance matrix of the random 
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errors (see appendix B). Thus the x (0) of (5.3c) is that member of X such that F(x (0) ) gives the 
best fit to y (0) in the sense of weighted least squares. The z (0) of (5.3e) is the value the prediction 
would have if x (0) were actually the correct earth model and there were no errors. 

Of the two equations (5.2c, d) into which (4. Id) was partitioned, only (5.2c) has been used 
so far. If F(X) = Y, then Q F{X) =0, and (5.2d) has no content. On the other hand, if 

dim X « dim Y (5.1 1) 

then, since m =0, (4.3) holds, and it is possible to use the data vector y (0) to test the hypothesis 
that 8y was drawn at random from a population whose probability distribution is p E . In the sim- 
plest case, |S y | « 1 so \\Q F (X)(7J)II « 1 in (5.2d). Since any component of 6y has variance 1, it 
follows that Q F( x)(V) can be neglected in (5.2d). Then that equation asserts that £>f(X)( J' ( 0) ), the 
residual part of y (0) which remains after removing the best least squares fit, is itself a random 
sample from the population Q F (x)(8y). The probability distribution of this population is the mar- 
ginal distribution of p E on F (X ) x . Many tests of this hypothesis are available in the extensive 
literature on tests of fit (see, e.g., Kendall and Stuart, 1979, Ch. 30). As an example, suppose that 

p E is Gaussian. Let N =dimX =dimF(X), and let (5V+i Sd ) he an orthononnal basis for 

F(X) x . Then the D -N numbers Si oy (0) with i -D +1 should be independent random 
samples from a one-dimensional Gaussian population with mean 0 and variance 1. There are a 
variety of tests of this hypothesis (Kendall and Stuart, loc cit), one being the Kolmogorov- 
Smimoff test. Another test uses the fact that 

s 2 :=(D-N)- 1 £ (Si oy (0) ) 2 (5.12a) 

i'==A T +l 

is the sum of D -N independent identically distributed random variables, so the probability dis- 
tribution for s 2 is nearly Gaussian (the exact distribution for ( D ~N)s 2 is chi-square with D -N 
degrees of freedom). The mean of s 2 is 1, and its variance is 2(D -A 7 ) -1 , so the probability is 
nearly 1 -p that 

|s 2 -l| <v(p)[2l(D -N)]'* . (5.12b) 
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Thai is, if Pe is Gaussian then (5.12b) holds with failure rate approximately p. In (5.12b), v(-) is 
the inverse of the functionp(-) defined by (5.10). 

6 USING A PRIOR QUADRATIC BOUND TO MAKE THE DATA FUNCTION 
INJECTIVE 

The preceding section is based on three crucial assumptions: that the observer knows the proba- 
bility distribution p E for the random error Sy in the data vector y (0) ; that F :X ->Y and 
g :X —>R are linear; and that the data function F:X —>Y is an injection. The present section 
shows how to relax the third assumption, so as to treat realistic linear inverse problems in which 
dim X = «>°. Section 5 cannot treat such problems directly because if F is injective then 
dim X = dim F (. X ). Since F (X ) c Y . it follows that dim X < dim Y = D < ~ when F is injective. 

When dim X =«■ (or more generally, when F is not injective even though dim X <«>) the 
idea is to approximate the inverse problem (4.1) by another problem with an injective data func- 
tion. To generate this subproblem, it is necessary to invoke prior information beyond that con- 
tained in the original problem (4.1). In the present section, the new prior information will be a 
quadratic bound (1.4). The observer knows a constant q and a positive definite quadratic form Q 
on X ; and the observer knows that there are vectors xeX, 7j e S Y and a scalar £ € S z such that 
x, 77,£ satisfy (4.1) and x also satisfies (1.4). 

Appendix B shows that q~ ] Q defines a dot product on X. The dot product of x and X will 
be written xnx, and ||x||, the norm (or length) of the vector x, will be defined as ||x|| = (xdx) 1 ' 4 . 
Then (1.4) can be written as 

II x H ^ 1 . (6.1) 

Even knowing (6.1), the observer cannot reduce (4.1) to a problem with an injective data 
function unless certain further information is available about the quadratic form Q which appears 
in the prior bound, (1.4) or (6. 1). Specifically, the bound (1.4) must be a strong enough constraint 
on x that it makes the given data function F :X — »T and the given prediction functional 
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g :X —>R continuous in the norm ||x|| which q X Q defines. That is, lim ||x„ -x|| =0 must 

n — 

always imply lim ||F(x„)-.F(x)i[ =0 and lim |£(x„)-£(x)| =0. Since F and g arc linear, 

n — ►<» n — 

continuity at x = 0 is equivalent to continuity at ever}' xg X (Halmos, 1951). And since F and g 
are linear, continuity at x=0 is equivalent to boundedness (see appendix C for definitions and 
Halmos, 1951, for proofs). 

When dim A <«, F and g will always be continuous because they are linear, but when 
dimA=«> linearity does not imply continuity (Halmos, 1951). Continuity of F and g with 
respect to the norm ||x|| is an extra assumption about that norm. When dim A =~ and F is not 
continuous, CSI requires the observer to reexamine her or his prior knowledge of the earth and try 
to replace (1.4) with another such inequality strong enough to make F continuous. Backus 
(1970b) describes some methods ("quellings") for doing so. When dim A' =«> and g is not con- 
tinuous, the observer has two choices: finding a stronger Q for (1.4), or accepting a slightly dif- 
ferent prediction z A for which (4.1e) can be replaced by z A =g A (x)+CA> with a new prediction 
functional g A which is continuous. This is the tactic adopted by Backus and Gilbert (1968, 
1970). 

Besides assuring that F and g are continuous, which may require reexamining the prior 
geophysical information, the observer must also make sure that X is a Hilbert space with the dot 
product defined by the quadratic form q~ l Q . This is merely a mathematical technicality, and 
requires no new geophysics. To say that the dot product space A' is a Hilbert space is merely to 
say that it is complete in the norm ||x||; that is, all its Cauchy sequences have limits. This is a 
trivial requirement because if X is not complete it can always be completed (Halmos, 1951). 
Therefore it can be assumed without loss of generality that X in (4.1), (6.1) is a Hilbert space 
with the dot product defined by q~ l Q . In the present paper, it will be assumed that A is separable 
(Halmos, 1951). This assumption is not essential, but does simplify the notation, since it 
amounts to assuming that all orthonormal bases for A' can be indexed by the integers. In ever}' 
Hilbert space inverse problem known to the author, A' is separable. 
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Since X is a Hilbert space and £ is a bounded linear functional on X , there is a unique vec- 
tor g e X such that 

g (x) = g □ x (6.2a) 

foreveryx€ X. Moreover, if ||g|| := (gng)' / \ then 

llgll = llgll (6.2b) 

where ||£ || is defined as in appendix C. 

II £ II : = sup { | £ (x) | :xefi*(l)}, (6.2c) 

the ball B x (a) being defined for any real a as 

B x (a) := { x : x e X and ||x|| Set } . (6.2d) 

Also, since X and Y are both Hilbert spaces and F :X ->Y is bounded and linear, there is a 
unique tensor Fe f (see appendix Q such that 

F(x) = Fnx (6.2e) 

for every x e X . Moreover, 

Ill'll = l|F|| (6.2f) 

where, as in appendix C 

||F||:=sup{||F(x)||:xeB*(l)} (6.2g) 

and 

||F|| := sup { |y dF dx| :ye B r (l) and xe B*(l)} . (6.2h) 

Now the inverse problem (4.1), (6.1) can be written as follows: 

p E =P(0 , 0 (6-3a) 

77 e S r (6.3b) 

c € S z (6.3c) 

y(°) = F □ x +77 + (6.3d) 

z=gcix+£ (6.3e) 
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||x|! < 1 . (6.30 

In the present section, the number m of parameters in the parameter vector 6 is 0, so (6.3a) sim- 
ply means that p E is known. The precise statement of the inverse problem (6.3) is this: the 
observer wants to estimate z from what is known. What is known is p E , S Y , S z , y (0) , F, g, the 
fact that 8y was drawn at random from a population with probability density p E , and the fact that 
there is a triple (x,tj,£) which satisfies (6.3). 

The observer can approximate (6.3) in many different ways by a problem with an injective 
data function, and so is free to choose an approximation which minimizes the error in estimating 
z . Section 7 will describe one technique for minimizing this error. The present section discusses 
how to find all the approximations to (6.3) which have injective data functions. 

To produce any such approximation, let N be a positive integer, and let X^ be any N - 
dimensional subspace of X. It will not be necessary to assume that dimX =«>, but that is the 
most interesting case, so the notation will be tailored to it. Define 

JW** (6.4a) 

and 

X (h ooj t= Xfj ~ . (6.4b) 

Since X is a Hilbert space, 

X = X (Q'K ) © X (p (6.4c) 

This conventional notation for orthogonal direct sums is described in appendix D, and simply 
abbreviates the following remarks. For every' xe X there are unique vectors x (0iW) € X^^ and 
x (N,«o)€ X(n,«) such that 

x = x (o // ) + X (N ,oo) • (6.4d) 

These vectors have the additional property' that 

x (o jr) □ x (N _«,) = 0 (6.4e) 
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so that 

l|x|| 2 = ||x (0iN) || 2 + |ix (Ar „ ) || 2 . (6.40 

To approximate F and g , define 

F ( 0 f>) := F I X (Qjj) (6.5a) 

F(N,°o) := F | (6.5b) 

8 ( 0 //) ; = 8 I X (0A') (6.5c) 

8(N,oo) : = 8 I » (6.5d) 


and let the corresponding tensors and vectors be F (0A -)€ Y ®X (0Jsl y Y ®X (A r 

g( 0 fj)£ X(oji) and g (N,-) e X( Aoo ). Now an apparent ambiguity of notation must be resolved. 
There are two ways to construct g(o^) and g (A «,) from g : the construction g |-> g in (6.2a) fol- 
lowed by gh>(g (0<W ),g(Ar,-.)) as in (6.4d); or the construction g (g (0 .n )-£(*. ~>) of (6.5c, d) fol- 
lowed by g(o,w)l-»g(o/0 and 8 w g(N as in (6.2a). Obviously both constructions give the 

same result, so in fact the notation is unambiguous. 

Since F and g are linear, F(x)=F(x (0A r)) + F(x (A i „ o) ) = F (0<N) (x m) ) + F (A>oo) (x (A , _„)), and 


similarly for g . Therefore 

F d x = F( 0>A r) □ X( 0iA ) + F( A „) d x (A r j00 ) (6.6a) 

and 

g D X = g(0,A') D X(0 J\’) + g(V,~) Cl X (A ,„ } . (6.6b) 

Substituting the expressions in (6.3d, e) suggests the following definitions: 

V -= F( Ai „) □ X (A , + T) (6.7a) 

C(N~) : = g(N ,«.) D X (A , ,„) + C (6.7b) 

sk-) : = F^F^d)) + (6.7c) 

5f A .„) := g (*.„) (B X(A, -“>(1)) + S z (6.7d) 

where B (Ar, ~ ) is defined by replacing X with X ^ in (6.2d). Note that 

g (A - >oo) (B X ^( 1)) = F 2 ( ||g (A , || ) (6.7e) 
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where B z is defined by (5.8a). The preceding definitions permit (6.3) to be recast in the follow- 


ing form: 

Pe =P@c) 

*7(N~) e S(A',~) 

C(N,~) e 

y (0) = F(o//) D X (0A') + t 1 (n , oc ) + Sy 
2 “ g(0//) DX (0^) + C(W ,~) 
l! x (0,w)ll - 1 • 


(6.8a) 
(6.8b) 
(6.8 c) 
(6.8d) 
(6.8c) 
(6.81) 


To obtain (6.8) from (6.3), note that (6.8a) is (6.3a) and that (6.8d,e) are obvious consequences of 
(6.3d, e), (6.6), and (6.7a, b). The bound (6.81) follows from (6.30 and (6.41). Assertions (6.8b, c) 
follow from (6.3b,c) once it is shown that F (N ^ a x (/ , ■ ^ e F ^(B x ( 1 )) and 
g(Ar,»)n g (N ^fB x (\)). These claims are established by noting that (6.31) and (6.4f) 
imply ||x (N ,.)l| £ 1. 

The problem (6.8) is formally identical with (6.3), but now the data space is A' (0iN) with 
dimX( 0 ^)=A r <°°, and the data function and prediction functional are F^y.X^^Y and 
These are approximations to the original F and g , and the errors of approxi- 
mation, the truncation errors, are included in (6.8b,c). 

Nothing in the foregoing construction of F (0 ^) assures that it will be injective. However, it 
can always be restricted to be so. Let N (0-W ) be the null space of F (0Js! y That is. 


N(o^) : = { x : x e X ) and F(o,n)(x) = 0 ) . (6.9a) 

Then (F (0j/ v) | N ( ojv>) : No.v) - ^ T is necessarily injective. To prove this, it suffices to prove that 
F (o^) | N ( o^) never assigns 0 to a nonzero vector in its domain (Halmos, 1958). But if x e 
and F m )(x) = 0, then also x e N (0 ^), so x is orthogonal to itself, and hence x = 0. 

Now let 

M = dim N(o^) (6.9b) 

and in the argument leading from (6.3) to (6.8) use 


I 
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= N(0JV) (6.9c) 

instead of Xa,. Then X (0M) =X M =N ( oa) and the new F (0M) will be an injection ofX {0M) into Y. 
Thus section 5 becomes applicable. The numerical calculation of X (0M) and F (0W) from X (0 ^) 
and F( 0 //) can be done quite explicitly, since F (0Jif) = ^(o/v) I %(0M) = F I ^(ojvy For any vector 

xe%), the vector F <0 iN) (x) in Y can be written (F/o^OO Ff 0A>) (\)). For any 

i € {1 D ) the real number F( 0 iA ')(x) depends linearly and continuously on xe X^y There- 

fore F( 0r v) '-X^y-tR * s a continuous linear functional on X^y Hence there is a unique vec- 
tor F( 0 i a)€ X ( o,n) such that 

F (o^ ) (x) = F (o^ ) □ x (6.9d) 

for every xeX w . The functional will be called the ith restricted data functional and 
F( 0 jv) will be called the i th restricted data kernel. The space N^) is spanned by the D restricted 
data kernels F ( J 0/o , Ffay 

One particular choice ofX N will assure from the outset that F^y.X^^Y is injective, 
so that the extra step described in the preceding two paragraphs is unnecessary. Let Np be the 
whole null space of F , that is 

NV := {x : x e X and F(x) = 0 } . (6.10a) 

Writing F (\) = (F l (\) F D (x)) defines the D data functionals F‘ :X ~>R, and (6.2a) provides 

the corresponding data kernels F' e X (Backus & Gilbert, 1967). The space N/ is spanned by 
F 1 , .... F°, so dim N/<£> . Tak tN =dimN/ and 

X N = Nf . (6.10b) 

When this X N is used to obtain (6.8) from (6.3), clearly F (0 //) : X (0> n) -» Y will be an injection. 

Choosing X N =N p avoids the extra step (6.9) in the approximation process for (6.3), and 
has the further advantage that with this choice 

F (A>) = 0 (6.10c) 

and (6.7c) becomes 
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: = • (6-10d) 

In other words, (6.10b) produces the smallest possible truncation error (namely 0) among all 
choices of subspaces of X. However, (6.10b) has one serious disadvantage: it requires at least 
some numerical computation with DxD full matrices, and D is often very large. For satellite 
magnetic data, D is typically of the order of 10 4 . Another problem is almost certain to appear in 
(6.10) but can also arise in (6.9). The data functionals F l , ...,F N in (6.10) or F X ,...,F M in (6.9) 
may be linearly dependent or nearly dependent. Then a very small subset of them may carry all 
the information in y (0 ^ which exceeds the noise, tj +Sy. Computations with the whole linearly 
independent set of data functionals will be numerically ill-conditioned and wasteful of computer 
resources. All these questions are addressed in the next section, which shows a way better than 
(6.9c) to choose X M , whether X N is arbitrary or is given by (6. 1 0). 
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7 RESOLUTION IN CONFIDENCE SET INFERENCE 

When dim A" =°° and the observer accepts a prior quadratic bound on the correct earth model x, 
the choices (6.9c) and (6.10b) provide two approximations to the linear inverse problem (6.3). 
Each approximation provides a finite-dimensional model space X , an injective data function 
F :X —>Y, and known confining sets S Y and S z for the systematic errors tj and £. Therefore 
both approximations are amenable to treatment by section 5. However, both approximations are 
potentially subject to a serious numerical defect. Even though F -1 :F(X)-+X exists in both 
cases, ||F -1 || may be so large that ||g o F _1 || is too large to make (5.8d,e) or (5.8f) a useful esti- 
mate of z. Since F is linear, all that is required to make F injective is that F(£)*0 for all 
| e 95* (1), where 9B X (1):= [LxeX and ||x|| = l}. Nothing prevents F(<f) from being very 
small for some such £, so that ||F -1 || is large. Physically speaking, if e dB x (\) and F(£) is 
smaller than the error rj +5y, then whatever component £ ox of is present in the correct earth 
model x, that component will contribute to a "signal" which is below the noise, because (6.3f) 
requires |£ ox| < 1. Then, even though F(£)* 0, it is fruitless to try to resolve |nx and use it in 
estimating z . 

In Bayesian inference and stochastic inversion, the computations required to pick out the 
resolvable parts of the earth model x in problem (6.3) are well-known (Jackson, 1979; Gubbins 
and Bloxham, 1985; Backus, 1988a). The present section describes the corresponding calcula- 
tions of resolution for confidence set inference (CSI). 

The first step is to use the prior quadratic bound to approximate (6.3) by a finite- 
dimensional problem (6.8). There is no restriction on the X N used here, and either (6.9c) or 
(6.10b) is acceptable. It will be assumed, however, that 

N = dim Xft < dim Y = D . (7.1a) 

A more complicated notation would make (7.1a) avoidable, but the effort is of doubtful value, 
since (7.1a) holds in most practical inversions, and certainly in (6.10b). 
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Resolution in CSI is based on the eigcnstructure or singular value decomposition (SVD) of 
the finite-dimensional approximate data function F (0 ^y. A' (a/V -)— » Y. The eigcnstructure is avail- 
able because both and Y are Hilbert spaces; and the eigenstructure is physically useful 

because the dot products on X and Y arise from real physical information about the correct earth 
model X and the random error 5y. Appendix D describes SVD in the form most convenient for 
the present discussion. 

Let the eigenfactors or singular values of F (ojy) in descending order be 
>d 2 > • • • ><p N >0 (7.1b) 

where multiple eigenfactors are repeated according to their multiplicities. Let {£j .... } and 


{j’j, n ) be orthonormal subsets ofX^-^ and Y such that 

Fw)(Si)=<h Si (7-lc) 

for i = l,...,N. Since N= dimX^y {£j H N ) is an orthonormal basis forX (0 jv) and can be 

extended to an orthonormal basis for all of X , written {£j , $ 2 , ' ‘ ' ) • For each integer i define 

ft** go*,- (7-ld) 

and 

X t X □ X; (7.1e) 

so that 

x=£x ( x, (7. If) 

i=i 

and 

g=£ (7-lg) 

i=i 


Here x and g are the earth model and prediction functional appearing in (6.3). 

Calculation of resolution is facilitated by the following notation. For any integers p and q 
such that 0 <p<q, define X ^ as the subspace of X spanned by {x p+ i, ...,x q }. Define 
X(pj>) = {0}. ^ let X^^ be the subspace of X spanned by {^+j,^ + 2> ' ' ‘ For the special 
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cases A'(ojv) and ^(N,~) dicse definitions agree with those already introduced in (6.4). If 
0<p <q define 

'W'f I ; (7.2a) 

«0.,r=« ^ 

•W-fCW: < 7 - 2c > 

< 7 - 2d > 

the orthogonal projector of X onto ; 

'U’-'V,,. < 7 - 2c > 

the orthogonal projector of Y onto Y(p, q )l 

x (p.?) :=f (Sm) (*): ( 7 - 2f ) 

and 

y$W-= p U)<y (oy )- ( 7 - 2 s) 

This notation leads to a number of useful identities, some of which are expressed most 
easily in the tensor language of appendices C and D. If 

0<p ^ q °o (7.3a) 

then 

x (p,?)= £ *< x ; ( 7 - 3b ) 

i=p+l 

g(p^)= £ (7.3c) 

and 

Pj. 9) = £ M/ (7.3d) 

where a sum is understood to vanish if its upper limit is less than its lower limit. If 

0<p<q<N (7.4a) 

then * 
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P{*)* £ Si Si (7.4b) 

i=p + 1 

and 

F (Pi<7) = £ <hSi*i • (7.4c) 

I —p •+■ 1 

If 0 <p < 4 and q is such that 

<t> q > 0 (7.5a) 

then (7.1b) implies that F (p ?) :A r (/ , i9) -^y ( p^ ) isa bijection whose inverse F^\) is rcpresented 
by the tensor 

F^. ? )= £ 4>r'*,Si- (7.5b) 

Here has as its domain 7^), but F^) can be inteipreted as a member of either 

y (p,q) otX ®Y. Equation (7.5b) is simply (D.lle) in appendix D. It is also important 
to observe that (7.5b), (7.4b) and the orthonormality of (>’i Sn ) imply 

F(p ,q ) E ,q)~ F(p ,q ) • (7.5c) 

If n is any integer satisfying 

0< n <N , (7.6a) 

then in the standard notation described in appendix D 

^(om = ^(o,«)®^(»>') (7.6b) 

X („,«,) = X( n ^ © X (7.6c) 

X = ^(0.«)©^(ujv)®^(W-)- (7.6d) 

Now the idea is to approximate (6.3) by each of the X (0n) for n = 1, ...,N and to try to select 
n to minimize \xf 0A ^{p )\ , the radius of the confidence interval with failure ratep for the predic- 
tion z . For each n in (7.6a), definitions (6.7a,b) become 

V ( n ,“)' = F(« ,•») ^ ,oc) 4" tj (7 ,6e) 

C (« .«») : = g(n ,~) D X (n ,») + C • (7.6f) 
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If n is small enough so that 

(t> n > 0 (7.7a) 

then F (p,n)'X ( 0 *)-* Y is an injection, and section 5 can be applied verbatim to the linear, finite- 
dimensional injective inverse problem whose terms in (4.1) are X :=A' (0n) , F:=F( 0n y 
8 8 (o,n )> x:=x (0 , B)> n :=!?(„,«). and £=£(„,»)• The solution, (5.3d) or (5.8e), can be slightly 

simplified because now X( 0n ) as well as Y has a dot product, and tensors are available. Using the 


tensor notation, (5.3c, e) become 

x $?n ) : = F(0 !»1 ) a ) (7.7b) 

and 

). (7-70 

Using (7.5c) makes it possible to write the solution (5.3d) in the form 

2=2 <i?3i ) + C(n,~)- g(Os ) D F(o 1 /I )D Ol („ ,») + Sy) . (7.7d) 

This last equation can be slightly simplified. Clearly F (n „ ) =F ( „ tW )+F (A f >00 ) and, from 
(7.4c) and (7.5b), 

F ( o!„) d F (niN) = 0 , (7.8a) 

so 

F(o!n) a F( Bi oo) = F(o!n)DF (A , i00 ) . (7.8b) 

Therefore, from definition (7.6e) applied to n and N 

F(o!n) orj (n ) qtj ( A ' >oo) (7.8c) 

and (7.7d) can be replaced by 

2 = 2 (§?„) +C(r ,-) - 8(0,n ) o F^ } □ (j] (A , >oc) + Sy ) . (7.8d) 


For the particular choice X A >=N p, i.e., (6.10b), clearly F( W oo )=0, so in (7.8d) can be 

replaced by ri, the original systematic error in (4.1b) and (4.1d). 
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For anyp € (0, 1] a confidence interval Kf 0n ^(p) can be written down from (7.8d) such that 

z e K(o,n ) (p) ( 7 - 9a ) 

with failure ratep. Here z is the correct value of the prediction. To construct K z 0n) (p), first 


define 

y ( o. B ) : =g(o.«)nF ( o!„) (7.9b) 

and 

f( 0 ^) : =y( 0 .»)/llr ( o.«)ll • (7.9c) 

Let 

v n (p) = v(y {0in) ,p) (7.9d) 

as calculated from p E via (5.7). Then (5.8c) and (7.5c) permit the assertion that, with failure rate 
P< 

1 8(o^i ) a F (o!») o Sy | < ||r ( o^)ll v n <P) • (7-9e) 

For any set Q £ Y and any y e Y, define 
yaQ := {yoy :ye Q } . 

Then (7.8d) implies that with failure rate <p 

2 G “(O.n) + " 7(0,n ) d ^(N,~) +fiZ (ll7(0^)ll V '«( D )) (7.9f) 


where v n (p) is given by (7.9d), y (0> „) by (7.9b), and B z by (5.8a). Using (6.7d,e) to evaluate 
S z n shows that the set on the right of (7.9f) is 

) (P) : = 2 (SI ) + S Z + B 1 ( II g(« ,~)ll +v„(p) lly ( o^i >11 ) - 7(o^i ) D 5 1 >oe) . (7.9g) 

Thus, when K z Qtn) (p) is defined by (7.9g), (7.9a) holds with failure rate <p. Usually, S z and 
SJn.oo) are symmetrica] about the origin, and then (7.9a) and (7.9g) together are equivalent to the 
assertion that, with failure rate <p , 

I 2 -z $n) I ^ |S Z I + llg(B,~)ll +v (”*p)ll7(0,n)ll + l7(0,n)°S(A’ ~) | . 


(7.9h) 



George E. Backus 


Confidence Set Inference 43 


The foregoing calculation can be repeated for any n satisfying (7.7a), and the idea is to 
choose n to minimize the right side of (7.9h). Therefore it is necessary to understand how the 
terms in (7.9h) vary with n . In the interest of simplicity it will be assumed that 


SJs'..)=B r 0), (7.10a) 

the ball defined by (5.9b). Then, by (7.8c), 

17(0,0 ° 4 .-> I =llr ( o^)ll/3 (7.10b) 

so (7.9h) becomes 

\z-z$ n) \S\S z \+T p (n) (7.11a) 

where 

T p (n) := ||g( B ,«)|| +lly (0 ^)lk„(p) (7.11b) 

and 

K„(p):=p+v(n,p).. (7.11c) 


Let M be the largest n such that <p n >0. The value of T p (n) can be calculated for 
n =0, 1, ..... M, and the minimum can be chosen by inspection. 

Why might the observer expect to find values of n e {0, .... M } for which T p (n ) is substan- 
tially less then T p (0) or T p (M)? A rigorous general proof seems unattainable, but a suggestive 
discussion can be given. From (7.11b), r p (0) = ||g(o,-)ll = ||g||. In this case, the estimate of z in 
(7. 11a) comes entirely from the prior information (6.30- The data vector y (0) contributes nothing. 
If the data are at all relevant to the prediction, then T p (n) should decrease initially as n increases 
from 0 and the data begin to contribute to the estimate of z. But why should T p {n) start to 
increase again before n reaches Ml 

The reason lies in a characteristic property of most non-trivial linear inverse problems. In 
an ideal problem, all possible data could be collected without error, and would determine a 
unique earth model x. In this ideal case, both the model space X and the data space Y are infinite 
dimensional Hilbert spaces, and the data function F :X —>Y is a linear injection. In the non- 
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trivial ideal problems, however, F is compact (Kato, 1976), so that <j> n — »0 as n — > In a real 
inverse problem with dim Y <**>, but with Y large enough to give some hope of approximating 
the ideal problem and obtaining useful limits on the prediction, the observer should expect to find 

</,„ « 1 (7.12a) 

as n increases toward M, and indeed this is commonly observed (Gubbins and Bloxham, 1985; 
Langel and Estes, private communication). 

To see more clearly how T p (n ) will vary with n , introduce the abbreviations 
K (" ) := II g( n ,.)ll 2 and S (n ) := )lr ( o^ )H 2 - Then fr o m (7.3c) 

«<*)=£ *? C7.12b) 

and from (7.4c). (7.5b) and (7.10b) 

S(»)=i>fV. (7.12c) 

1 “ 1 


Furthermore, 

T p (n)=R(n)' A + S(n)\ n (p) 
Define 


(7.12d) 


(7.13a) 


D p (n):= g- 2 [T p (n-\)-T p (n)] . 

Then, if the data are relevant, one can expect D p (n)> 0 for small n . If T p (n ) does have a 
minimum, then eventually D p (n ) will become negative. But a simple calculation gives 

D p (n ) = I/C (n-l)“ + R(n )*] -<p;hc n (p) [S (n- 1)“ + S(n )*] . (7.13b) 

Thus D p (jn ) > 0 exactly as long as 

K n (p)[R(n-l)»+R(n)' A ) 


4>n > 


<p n [S(n-l)»+S(n) l/t ) 


(7.13c) 


Gearly, if 0 <n<M , 
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/?(«— 1)* + *(«)*> 2 llg^ll 
while (7.1b) assures that 

+ £(«)*] S2||g ( o iW) || • 

Except for distributions (5.6a) with pathologically long tails (e.g., a supeiposition of two Gaus- 
sians with quite different variances) it will happen that v(y (0i „),p)> 1 for the interesting values of 
p (see Table 1). Then jc„ (p ) > 1 , and the right side of (7.13c) will always be at least 
llg(M,«)ll / llg( 0 ,M)ll- When n becomes so large that 

<Pn < llg(M,~)ll / II g(0>f )ll (7.13d) 

then D p (n ) < 0, and T p (n ) will increase with n . The minimum value of T p (rt) will certainly have 
been passed. 

The foregoing calculation of a confidence interval for z makes no explicit attempt to 
enforce the constraint (6.1), and the n which minimizes T p (n) may not satisfy 

||x (0(B) ||<l. (7.14a) 

As a result, the confidence interval K z (p ) is conservatively long, and could be shortened by 
enforcing (7.14a) as a further constraint. It is not clear whether the required calculations are 
worth doing. In practise, q in (1.4a) will not be known with any precision, and therefore the CSI 
estimate of | AT z (p) | will not inspire much confidence unless 

II x (0tB) || «1 (7.14b) 

at the value of n which minimizes T p (n). The cautious observer will compute || x (0>n) || and 
check (7.14b) as an essential part of CSI. For values of n near M, even (7.14a) is likely to fail, 
but such values of n are of no interest in estimating z . 
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8 ESTIMATING THE PROBABILITY DISTRIBUTION OF THE RANDOM ERRORS 

In sections 5, 6, and 7 it was assumed that the probability distribution p E for the random error 
vector Sy is known. In fact, under certain conditions p E can be estimated from the data vector 
y<°> The conditions are that p E be known to belong to a given m -parameter family of distribu- 
tions p (0, •); that there be a model space X such that F :X -4 Y is an injection; that (4.3) holds; 
and that |S y | « 1. These conditions must often be verified a posteriori, because |S r | depends 
on p E , that is, on the unknown value of 0. The optimal choice X =X (0n ^ in section 7 will also 
depend on this unknown 0. 

The recursive character of the foregoing statement of conditions under which p E can be 
estimated from the data is one result of the nonlinearity of the problem of estimating 0. It is 
difficult to make general statements about nonlinear inverse problems, but one class of problems 
admits a general solution by the method of maximum likelihood. This is the class in which all 
the probability distributions p(0,-) in the given family have density functions f (6, ): Y —*R. 
That is, for any Lebesgue measurable subset Q of Y, 

P(0,Q)= jf(P,y)dy l ■ ■ ■ dy D , (8.1a) 

Q 

a result usually abbreviated as 

dp(e,y)=f(e,y)d yi ••• dy D . (8.1b) 

Then the probability distributions (4.2e) all have densities given by 

dp{e,y)=f{e,y-F{x))dy x ■ dy D , (8.1c) 

where 0 = (0,x). The method of maximum likelihood estimates 0 and x from y® by choosing 
0® and x® to maximize / (0, y^— F (x)). The estimate is useful when 

A = dim Y - dim X - m (8.1d) 

satisfies A » 1. The A in (8. Id) is called the number of degrees of freedom in the inverse prob- 
lem. If/ (0,y (O) -F (x)) does have a unique maximum (0 (O) ,x (O) ), then 0 (O) and x (0 ^ are random 
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variables. Under mild conditions on/, the following facts are known (Cramer, 1946, ch. 33). 
Suppose that A » 1 and that for each 0 e ©, p (0, •) makes 8y j, .... Sy D independent; or alterna- 
tively, suppose that all the p (0, •) are Gaussian. Non-diagonal correlation matrices E [(<5y, )(5y;)] 
are permitted in the Gaussian case. The random variables 0 (O) and x (0) are approximately Gaus- 
sian, with variance matrices of order A -1 and means correct to order X~' A . From these facts, 
confidence sets K ¥ (p,0) can be constructed as in (3.3a), using (4.2b). The author has not yet 
carried out this program for any m > 1, but believes that the outcome will be the same as in the 
case m =1 to be described below. That is, for failure rates p which are not too small in a sense 
determined by the A of (8.1d) (see (8.5a)), there will be a positive p t «p and a confidence set 
K e (p , y®®) with two crucial properties: (1) the true 0 will be a member of AT e (p , y* 0 *) with failure 
rate pf, (2) all the p {0, ) with 0 e K e (p , y (0) ) will be nearly the same. Then p E can be taken to 
be any one of the p(0, ) with 0 e K e (p , y®), and this known p E can be used in sections 5, 6, and 
7 with failure ratep -p \ to produce a confidence set for z with failure ratep. 

This phenomenon is visible in one common formulation of the geomagnetic modelling 
problem, where m = 1. Here the data function F :X Y and the prediction functional g :X ->R 
are linear, and the probability distribution p E is believed to belong to the one-parameter family 
p (i 9 , •)» all of whose members are Gaussian on Y with mean 0 and with variance matrices which 
are 6 2 times the known variance matrix of p (1, •). This latter need not be diagonal. The usual 
statement of this problem is that the unknown parameter 0 is to be estimated along with the earth 
model x (0 >. In CSI, estimating 0 and 2 are separate problems, and neither requires the other, 
although the scheme presented here begins by producing an estimate for 6 whose failure rate is 
much less than the failure ratep desired for z . 

To begin the estimates, note that if Q is any Lebesgue-measurable subset of Y then 
p(0,Q)=p(l,e~ l Q). (8.2a) 

If h : Y — »/? , then E e [h (<5y)] will denote the expected value of h (Sy) under p (6, •). That is, 

E e [h (Sy)] := J dp (0, y )h (y) . (8.2b) 

Y 
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Then 

£ e [<5}’,] = 0 (8.2c) 

and, because of (8.2a), 

E e K8y,)(6y } )] =0 2 £ ,[(<!>>’, Xfy)] . (8-2d) 

Let (y □ y) e denote the dot product defined on Y by p(6, ) as in appendix B. Then (8.2d) makes 
clear that 

(y o y) e =0 _2 (yay)i (8.2e) 

and, in an obvious notation, 

llyll*=0 -1 llylli (8.20 

In particular, if/?(l, ) makes ft a unit vector in Y, then p (d, •) will make 0ft a unit vector, which 
prompts the abbreviation 

Se = 6$i (8-2g) 

The advantage of (8.2e) is that the notion of orthogonality in Y does not depend on 0. 
Therefore the orthogonal projectors of Y onto F (X) and F ( X ) x are independent of 6. All of sec- 
tions 5, 6, and 7 can be carried out for 6 = 1 , and the corresponding results for any 6 can be 
obtained by using (8.2e) to introduce appropriate powers of 6 in the various terms in sections 5, 6, 
and 7. For example, if <p;(6) is the ith eigenfactor of F :X -*Y in the sequence (7.1b) when 
p E =p (6, •), then 

*,-(#) =0-fy,-( !)• (8.2h) 

For any positive 8, let s (6) 2 denote the observed value of the random variable (5.12a) calcu- 
lated under the assumption that p E =p (d, •). Then (8.2e) implies 

s(e) 2 =e~ 2 s(l) 2 (8.3a) 

and (5.12b) becomes 

|0- 2 s(l) 2 - 1 1 < v(p ,)2 '\D -N)~' a , (8.3b) 
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If p E really is then (8.3b) is true with failure ratepj. Then so are all its consequences, 

including the assertion that 

e <s(m-v(p\)2'\D -ATV* • (8.3c) 

If p E =p(6, •), then (5.7d) becomes 

I (?e D 5 y)o I <v(p 2 ) (8 .4a) 

where v( ) is the inverse of the functionp(-) defined by (5.1 1), and y e is as in (8.2g). Then (8.2e) 
and (8.2g) show that (8.4a) is equivalent to 

I (ft o ^y)i I <0v(p2) . (8.4b) 

Since (8.4a) holds at failure ratep 2 , so does (8.4b). 

Confidence sets can be combined by the rules given in the paragraph following (3.3f). 
According to those rules, at a failure rate no larger than 

p=pi+p 2 (8.4c) 

both (8.4b) and (8.3c) are true. Therefore, at a failure rate no greater thanp , 

I (ft o <5y)j | <v(p 2 Ml)[l-v(p 1 )2'^ -AT’V* • (8.4d) 

Ifp is fixed, (8.4d) will be satisfied with failure rate <p for any choice ofpj andp 2 satisfying 
(8.4c), sop j and p 2 can be chosen to minimize the right side of (8.4d), and then (8.4d) can be 
used as in section 5 to obtain K z (p), a confidence set for the prediction 2 at failure ratep . 

In fact, unless (D -N)' A » 1, the v(pj) in (8.4d) really should be calculated from (5.10), 
the Gaussian density being replaced by the density appropriate to the chi-square distribution with 
D -N degrees of freedom. Here it will be assumed that (D » 1. Then, if the observer is 
willing to accept a failure ratep large enough to permit 

v(p) « (D , (8.5a) 

it will be possible to choose p 1 so that 

Pi«p (8.5b) 
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and yet 

v{pi)«{D-N)' /l . (8.5c) 

With this choice ofp h 

p ~p 2 (8.5d) 

and with an error of order v(p i) ( D -NY*, (8.4d) becomes 

|( 7 ] o<5.v)i | <v(p)s(l) . (8.5c) 

For any 6 satisfying (8.3b), (8.5e) is equivalent to 

l(?i o<5y)i I <v(p)0 (8.51) 

with an error of order v(p X )(D -NY*. By (8.2e) and (8.2g), (8.51) is the same as 
I ( Ye ° fy)e I < v(P ) • ( 8 -5g) 


Therefore, except for a fractional inaccuracy of order v(pj)(D -NY*, the statement (8.5g) is 
correct with failure ratep if 6 is any parameter value which satisfies (8.3b). 

In other words, if the observer will accept a failure ratep which satisfies (8.5a), then he or 
she can estimate 6 by choosing any value which satisfies (8.3b), and can use p E -p (9, •) to define 
a dot product on 7. All of sections 5, 6, and 7 can then be invoked with the p E , and the 
confidence sets thus calculated will be in error by factors of the order of 1 +v(p j)(D -NY*- 

Certainly there are interesting problems where dim 0 > 1. For example, the distribution of 
data errors may be the sum of two Gaussians, one with much smaller amplitude and larger vari- 
ance then the other, so that there are outliers in the data. Alternatively, the mean (Sy) may not be 
known to vanish. If nothing is known about (Sy), of course the data are useless. Often, however, 
the unknown mean can be parametrized by a family of distributions for which dim © « dim Y, 
and then the mean can be estimated from the data. This is probably true, for example, of the orbit 
errors in satellite geomagnetism, although the method has not been tried there. When dim © > 1, 
the estimation of 6 is more complicated then the one-dimensional illustration considered in this 
section. For example, the dot product in Y will depend on 6. Further work is required to discuss 
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definitively the general case, that dim © > 1. 


9 THE GEOMAGNETIC EXAMPLE 

As an application of confidence set inference, consider the inverse problem described in section 
2: to estimate the geomagnetic field B at the CMB from measurements of Cartesian components 
of B at and above the earth’s surface. 

The infinite-dimensional model space X consists of all the magnetic fields (2.1) outside the 
CMB whose sources are inside the core. Only prior quadratic bounds of the form (2.5) wifi be 
considered, so the dot product in X of the two fields B and B with Gauss coefficients/) /"(a) and 
is 

B dB := q~' £ C(l ) £ /)/"(*)&» • (9-1) 

/» 1 m =*-/ 

To study whether the data function F :X ->Y and prediction functional g :X ->R are con- 
tinuous under (9.1) it will be very helpful to have the particular orthononmal basis for X gen- 
erated by the individual spherical harmonics Yf. Let x/" denote the model field B all of whose 
Gauss coefficients vanish with the one exception that 

pr(a) = q»COr'* ■ (9.2a) 

Then (9.1) evidently implies that 

x/" □ xf = 5 n 5^ . (9.2b) 

By (2.1), if B □ x /”=0 for all l and m then B = 0 . Hence the collection of all the t” is an ortho- 
normal basis for X in the sense of Hilbert space. For any model B with Gauss coefficients/) /"(a), 
(9.1) and (9.2a) imply that 

*/ m □ B = q~' A C ( / )*/)/"(* ) , (9.2c) 

SO 

B = q-' A J J C{l)' A 'Zpr{a)xr, (9.2d) 

/ =1 / 
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the scries being convergent in the norm defined by (9.1) (Halmos, 1951). 

To study the continuity of the data function and prediction functional it will also be helpful 
to introduce the definitions 


E l := {x, m : 1 </ <L + 1 and -l<m<l) (9-3a) 

and 


Here sp Z L is the "span" of E L , the set of all finite linear combinations of members of E L . The 
set X lL] is a subspace of X , and if L then dimX [L] =L(L +2). The subspace X H consists of 
all the scries (9 .2d) which have only finitely many nonzero terms, and X is the closure of X w in 
the noim generated by (9.1). 

Now suppose a linear functional g : X — >R is given. Is it continuous? To find out, define 

g r:=q-»C(l)' A g(xn- (9 ’ 4a) 


If B e X (L] then evidently 

s(B)=x £ grpr(a) ( 9 - 4b ) 

/=! m=— / 


where PJfiia ) are the Gauss coefficients of B. By the Riesz-Fisher representation theorem, 
(Lorch, 1962, p.63), the linear functional g is continuous iff there is a ge X such that for even' 
BeX 


g (B) = g d B . ( 9 - 4c ) 

From (9.4a), it then follows that g (x/ m )=gnx ; m , so 

g= 9 w £c(/r 1 ' 6 £ s r*r • ( 9 - 4d > 

/=! m— / 


Therefore 


iigii 2 =<? icf/r 1 

/-l 


£ (gn 2 - 


(9.4e) 
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If g e X , ||g|| < », so g is continuous iff the series (9.4e) converges. The coefficients gf required 
for this test can be obtained from either (9.4a) or (9.4b). If g is continuous, then (9.4b) holds for 
all B € X ; that is 

s( b )=£ z grpr(°)> ( 9 - 4 0 

/ =1 m ®-/ 

the series being absolutely convergent for all B e X . 

It is of great interest to by to use the data to reconstruct B r , the radial component of B, on 
the CMB. To do so, consider the various functionals g :X -*R which give weighted averages of 
B r over the CMB, with various weighting kernels. It will suffice to consider axisymmetric ker- 
nels. To describe these requires more notation. If c e R , define 

S(c) := {r : r e R 3 and |r|=c); (9.5a) 

then S(c) is the surface of the three-dimensional sphere of radius c centered on 0. If 
/ : S (1) R , define the average off over S (1) by 

if (f)>f : = (4rr) -1 J d^fft) (9.5b) 

S(l) 

where d 2 t is the element of surface area on S(l). Every function G : [—1 , 1 ] — > produces an 

axisymmetric weighting kernel g(§) at each point a § e 5 (a ). The linear functional g (§) : X — » R 
is defined by requiring for each B € X that 

g(§) 0 B := (G (f • s)B r id t)) f . (9.5c) 

The functional g (§) has coefficients g/"(8) as defined by (9.4f). To find them, write 

G(tO=ZG l P l Qi) (9.5d) 

/ =1 

where P,(p.)is the Legendre polynomial of degree l and 

l 

Gj=(2l+l) \ dvG(M)P,(M)- (9. 5e) 

-l 

The addition theorem for real scalar spherical harmonics (Erdelyi et al., 1953) permits (9.5d) with 


o ♦ „ 


— m moo 
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\i = f * § to be written 

G(f §)=£c / £ W)>TO (9.50 

/ =0 m~~-l 

(recall that T/" are real). Then, from (9.5c), 

g(§) o B = £ G ; £ yr(s)(Y, m (t)B r (ai)) f . (9.5g) 

/ s =0 m =~0 

Because of the Schmidt normalization of Yj ", (2.1) gives 

< W)*r (« f)> f = (/+l)(2/+ir 1 /3; m (u ) . (9.5h) 

Comparing (9.5g) with (9.40 leads to 

S/ m ($) = (/+l)(2/+ir‘G / y ; m (§) • (9.5i) 

Two useful consequences of (9.5i) should be noted. By the addition theorem for scalar spherical 
harmonics, 

£ I g/ m (§) 1 2 = (/+l) 2 (2/+ir 2 G; 2 (9.5j) 

m=>-l 

and so (9.4e) becomes 

ltg(§)ll 2 = q £ (/+l) 2 (2/+ir 2 C ( / r 1 G, 2 . (9.5k) 

/-i 

As pointed out in section 6, CSI will fail if the prediction functional g :X —*R is discon- 
tinuous under (9.1). The weighted averaging functional g(s) defined by (9.5c) is continuous iff 
(9.5k) converges. If (9.5k) diverges, then the prior quadratic bound (2.5) is too weak to overcome 
the non-uniqueness of the prediction z in (4.1e) produced by the fact that dim X =«>. Then the 
particular average (9.5c) cannot be estimated from surface and satellite data with only the prior 
information (2.5). One special case of (9.5c) is obtained by putting G(/t)=<5(^), the Dirac delta 
function, so G; =2/+l. In this case, g (s) o B = (a s) (Backus, 1986). It follows that the values 
of B r at single points a § on the CMB can be estimated from surface and satellite data iff (if and 
only if) the prior quadratic bound (2.5) satisfies 
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X(/+l) 2 C(/r'<oo. (9.6) 

/=i 

For the energy bound (2.3), C(l ) = (/+l)(2/+l) -1 , and for the heat-flow bound (2.4), 
C(/ )=(/+l)(2/+l)(2/+3)/ -1 . For both these bounds, (9.6) diverges, so neither bound is strong 
enough to permit estimation of B r at the CMB from surface and satellite measurements of B. 
The best one can expect with those bounds is to obtain localized averages of B r . 

One such localized average is the unweighted average of B r over a disc of radius a radians, 
or aa km, centered on the point a § e 5 (a ). For this average, 

G(/i) = (l-cosa) _1 (9.7) 

if cosa<ii<\ and G(ji)= 0 otherwise. When /»1, P,(cos6) is asymptotically the Bessel 
function J 0 [d{l+Vi)], so (9.5e) becomes (Abramowitz & Stegun, 1964, pp.336 and 361) 

G t = 2a(l -cosa) -1 J x [a(l+ l A)} 

~ (1 - cosa) _1 (&ra// )** cos [a(/ + V4) - 3^/4] 

as l -» Then (9.5k) converges for the heat-flow bound (2.4) but not for the energy bound (2.3). 
The heat-flow bound is strong enough to permit estimating uniform averages of B r over circular 
patches on the CMB, but the energy bound is not. If the energy bound is to be used to estimate a 
weighted average of B r over circular patches, the coefficients G t in (9.5d) must approach 0 faster 
than l~' A as / approaches infinity, so the kernel function G(ji) must be smoother than the boxcar 
(9.7). For example, G (p.) might be a tent function, so that G/ behaves like / -3/2 as / — > °°. 

Now consider the data space Y . For simplicity it will be supposed that the data (2.2) consist 
of all three Cartesian components of B measured at D/3 locations r,- on and above S ( b ), the sur- 
face of the earth. The random errors <5y, of the individual components will be assumed to be 
Gaussian, independent, and identically distributed with mean 0 and variance 6 2 . Then for any 
data vectors y and y, their dot product is 

yoy = 0 -2 Xy. y,- (9.8a) 

i=i 
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or 

D/3 

y aS = d~ 2 £ B(r,.) B(r,) (9.8b) 

1=1 

where B(r ( ) and B(r ; ) are the two measured vector magnetic fields at the location r,. The dot 
product on the right in (9.8b) is the ordinary vector dot product in R 3 . 

The data function F :X -» T is very simple in the geomagnetic example. If y=F(B) then 
the D components of y are the D Cartesian components of B(r) at the D/3 locations r f . In par- 
ticular, (9.8b) implies that for two models B and BeX, 

F(B)aF(B) =6~ 2 £ B(r,) • B(r.) . (9.9a) 

i=l 

Therefore, 

||F(B)|| 2 =0- 2 £ |B(r,)| 2 . (9.9b) 

i* 1 

To use CSI, the observer must verify that the data function F :X ->Y is bounded (continu- 
ous) in the norms on X and Y defined by (9.1) and (9.8a). To prove this, note that for any r > a , 
(2.1) implies 

B(r f) = £ (a ir ) ,+2 £ ) b ; m (f) (9.9c) 

/=1 m=—l 

where 

b/"(f) : = -V [r -, “ 1 y/"(f)] r «i . 

Furthermore, by the addition theorem for vector spherical harmonics (Backus, 1988a, Appendix 
A), iff e 5(1) then 

£ I b/"(f) 1 2 = (/ +1)(2/ +1) . (9.9d) 

m=-/ 

Therefore, by Schwarz’s inequality, 

I im^)br(f) i 2 <(/+d(2/+d £ 

m =>-/ w =*-/ 
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and hence, using Schwarz’s inequality once more, 

|B(rf)| 2 <<7 || B|| 2 £ (/+1)(2/+1)C(/ T\alr) v ^ (9.9c) 

/=i 

where ||B|| 2 = BdB as defined by (9.1). If r >a and C{1 ) is any rational function of /, then 
(9.9e) converges, so the linear function B|-» B(rf) with domain X and codomain R* is bounded 
for any fixed rf with r > a. Then the data function F '.X — » Y is also bounded, and in fact (9.9e) 
implies that 

||F|| 2 £ (qD/3)9~ 2 £ (/+1)(2/+1)C(/ (9.9f) 

/-t 

as long as |r,- 1 Zc > a for all observation points r,-. In particular, ||F|| <«> and F is continuous 
when the prior quadratic bound is either the energy bound or the heat-flow bound. Both these 
bounds permit the use of CSI as long as the prediction functional is bounded. 

To cast the geomagnetic problem in the form (4.1), it remains to produce sets and 

S z cR such that if 77 and £ are the systematic errors in modelling the data and the prediction 
then the observer is confident that 77 e S y and £ e S z . For simplicity, it will be assumed here 
that the prediction r in (4.1e) is a numerical property of the magnetic field produced outside the 
core by currents in the core. Nothing else is involved in z . In that case, 2 is exactly calculable 
from the correct model x, so £ = 0 and 

S z = { 0 } . (9.10a) 

The specification of S y is more difficult, since 77 includes contributions to the data from the man- 
tle, crust, ionosphere and magnetosphere. To treat the crustal contributions, one can model them 
(Langel, Estes, and Meade, 1982), but here they will be controlled by assuming that all the data 
are collected from satellites on or above the spherical surface S (c), which lies at an altitude c —b 
above 5 (b), the surface of the earth. Several workers (Lowes, 1974; Langel and Estes, 1982; 
Cain, Wang, and Schmitz, 1988) have used the satellite data to estimate the spatial power spec- 
trum of the geomagnetic field, as defined by Mauersberger (1956), Lucke (1957) and Lowes 
(1974). They find the spectrum’s dependence on spherical harmonic degree / to be well fitted by 
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a sum of two terms representing spatially white sources slightly below S(a) and slightly below 
S ( b ). If the second term is interpreted as that produced by the magnetization of the crust, then 
the satellite data provide a bound on 77 c , the contribution to tj from the crust. At altitude 
c-b =400 km, the parameters of fit obtained by Cain et al. (1988) give the ims total intensity of 
the crustal field to be about 

« = 12 nT. (9.10b) 

If B is measured at D /3 points on or above S ( c ), then 

||7] c ||<(u/0)(D/3) H . (9.10 c) 

Estimates of the contributions to r) from the mantle, ionosphere and magnetosphere are 
more difficult, and will be neglected here. In practice, ionospheric and magnetosphcric effects 
have been minimized by selectively discarding data particularly sensitive to these effects. For 
example, in modelling the MAGS AT data, Langel and Estes (1985) used vector data only at 
geomagnetic latitudes below 50°. In the polar caps at higher latitudes they used only total inten- 
sity, which is less seriously affected than are the horizontal components of B by the field-aligned 
currents in the magnetosphere. Using total intensity makes the data function F :X ->Y non- 
linear, but it is nearly linear because above 50° geomagnetic latitude the radial component of B 
accounts for at least ninety percent of the total intensity. If only the crust contributes 
significantly to the systematic error, then (9. 10c) implies that one can take 

S y =B r (p) (9.10d) 

where B Y is the ball defined by (5.9b), and 

J3 = (u/0)(D/3) H . (9.10e) 

The foregoing discussion casts the geomagnetic problem in the form (4.1) with the prior 
bound (6.1). Now section 6 can be used to estimate the truncation errors produced when X is 
replaced by an N -dimensional subspace X N , so as to permit numerical calculations. Of course 
the answer depends on how X# is chosen. One choice (Backus and Gilbert, 1968) is to follow 
(6.10b) and take X N =N/k Then X N =sp (Fj, ....F^, ), F { being the data kernels defined by 
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(1.2a). This choice leads to no truncation error, so 

S\ N ^=S Y , (9.1 Of) 

but it has the disadvantage that N is likdy to be D or nearly so. Parker and Shure (1982) chose 
X N to be the span of a subset of ), chosen so that N «D but with the observation 

points sufficiently well-distributed to make the truncation error small. Parker and Shure used 
numerical evidence rather than a rigorous theory to estimate the truncation error. Still another 
X N is obtained by triangulating S ( a ), giving B r at the nodes (vertices), and interpolating B r into 
each triangle from the vertices (R. L. Parker, private communication). The oldest choice of X N in 
geomagnetism, and the only one to be examined in the present paper, is truncation of (2.1) at 
some degree L. Thus X N will be one of the spaces X^ L ] defined by (9.3b), so that N =L(L+ 2). 
The space will be written and the functions : A'| L] —> Y and 

F lL] : X lL] -+ Y will be defined by 

F {L y=F \X [L] , (9.11a) 

F lL] := F | X lL] . (9.11b) 

Thus X [L] and F [£ ,j are the X (0j n) and F (0>W ) of (6.5a), while X IL] and are the X( N ^ and 
F (A’,..) of (6.5b). 

To calculate the value of needed in (6.7c), note that if B e then (9.9e) remains 

true when the lower limit of the sum over / is L + 1 instead of 1 . Therefore (9.9f) becomes 

l|F^|| 2 £ (qD/3)9~ 2 £ (/+l)(2/+l)C(/rVc) 2; ^. (9.11c) 

l=L+l 

Backus (1988a, appendix A) sums this infinite series when C(l ) comes from the energy bound 
(2.3), and provides an upper limit when C ( / ) comes from the heat flow bound (2.4). The latter 
implies that for the heat flow bound 

|| F {L \<e~\qDl6)'\alc) L+ \l-{alc) 2 r Vi . (9.11d) 

(In equation (A.16) and the inequality preceding (A.15) of Backus, 1988a, the factor 1+2 should 
be 2/+1. This misprint does not affect the subsequent calculations.) 
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Nothing is gained by taking L so large that the truncation error (9.1 Id) is several orders of 
magnitude smaller than the crustal systematic error (9.10c). The truncation error is less than ten 
percent of the crustal error when 

(c/a ) L+3 > (10/u )(q/2)' A [\-(a/c ff* . (9.1 le) 

With u = 12 nT, q = 3xl0 17 nT : and c =6771 km, (9.1 le) leads to 

L> 27. (9.111) 

It remains to carry out the analysis of resolution described in section 7. For this purpose, it 
is necessary to find the eigenstructure (singular value decomposition) of F > Y, as 
described in appendix D. In general, this will require numerical calculations based on the actual 
locations of the D/3 observation points on or above S(c). To avoid such calculations, and to 
obtain answers in closed analytic form, two assumptions will be made: 

2L + l«D' A \ (9.12a) 

and the D/3 observation points will be all on 5(c) and so uniformly distributed that the sum 
(9.9a) can be approximated by an integral when B and B e Then 

F (L] (B) □ F [L] (B) =0- 2 (Z> /3) < B(c f) • B(c t ) ) f . (9.12b) 

This result is obtained from (9.9a) by assigning to each r, e 5(c) a small patch of private terri- 
tory of area \2jic z ID , the patches being chosen to cover 5(c) without overlap. If the r, are not 
nearly uniformly distributed, the data B(r ( ) can be weighted by areas of private patches which 
differ from one r,- to another, so as to achieve (9.12b) when (9.12a) holds. This complication will 
not be explored here. 

In particular, if / <L and l <L then (9.12b) applies to B=£/" and B =£/", whose Gauss 
coefficients are given by (9.2a). Lowes (1966) and Backus (1986) evaluate the integral on the 
right of (9.12b) in this case, with the result that 

F lL] (*r)zF [L] (xf) = 8 ir 8 ruh (qD/Sy-^a/cf^d+DCd ) _1 (9.12c) 

Equations (9.12c) and (9.2b) show that when L satisfies (9.12a) then is an orthonormal 


September 19, 1988 



George E. Backus 


Confidence Set Inference 61 


eigenbasis for F \ L] in X^y The corresponding eigenfactor, 0/” = ||/ r (x”)|| /||x ; m ||, does not 
depend on m ; from (9.12c) it is 

<p, =6~ ] (qD Yi^ia Ic ) /+2 (/ +1 )' A C ( / f A . (9.1 2d) 

The cobasis for in Y corresponding to the eigenbasis Z L consists of the vectors 

$r=<pr'F(zn- ( 9.120 

Therefore, by (D.l 1c), 

Ffi^L^r 1 £ *r$r- ( 9.120 

/ *=1 m— / 

It remains to consider the spaces which appear in section 7, and to choose n to 

minimize the length of the confidence interval for 2 defined by (7.9a) and (7.11a). 

Because 0/" is independent of m , all the £/” with the same / will be treated together, so only 
spaces X( n<0) =X f/ j defined by (9.3b) will be considered. Here 1</ <L and n =1(1+ 2). The 
models g (0>B ) and g ( „ ^ will be written g [; ] and g f/ 1 so, by (9.4d), 

(9.13a) 

j - 1 *»—./ 

and 

g m = <?* £ COT* £ S/ 1 */ 1 , (9.13b) 

y«/«fl m=*-y 

Therefore, 

llg I/] ll 2 = <? £ COT 1 £ Isfl 2 . (9.13c) 

y=/+l m=—j 

In (9.13a, b,c), g” is defined by (9.4b) rather than as a Gauss coefficient of g. Similarly, 
7 (o,«) : = g(o^)DF(o!«) will be written as 

rt/J = g[/l°Ff / 1 ] (9.13d) 

so that, by (9.120, (9.13a) and (9.12d), 
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r[/] =0(3/D) M £ (c/ar 2 (j+\f A £ g?$f (9 13e) 

7=1 m =-J 

and 

|J rin ||2 = (302/£>)£(c/ fl )27 + 4 C/ + 1) -l £ |^m,2 (9 .131) 

7=1 »»— 7 

In (7.11), with n =1(1+2), Kf 0n) (p), T p {n) and K n (p) will be written Kf, j (p), T p [l] 
and *•[/ ](p). Since S z - {0 J in (7.11a), it follows that 

l*f/)(P)l =7'pt/] = llg [/J ll + llr [ /)ll^[;](p)- (9.13g) 

If L is chosen to make the truncation error (9. lid) one tenth of the crustal error (9.10c), the/3 in 
(7.11c) is given by 

/3 = (l.l)(u /<?)(£> /3) 1 * . (9.13h) 

Now the values of T p [ l ] for 1 < / < L must be examined, to see which is smallest. 

Two extreme examples of predictions z will illustrate the foregoing calculations. First, sup- 
pose the prediction functional g :X —> R corresponds to the kernel g(§) defined by (9.5c). In this 
case (9.5j) converts (9.13c,f) to 

llg t,] ll 2 = <? £ (j+\)\2j+\T 2 GfC(jT' (9.14a) 

7-/+1 

and 

\\Yii )H 2 = <&* ID ) £ ( c/a ) 2j+A U+WJ +D" 2 G/ . (9.14b) 

7-1 . 

The sum (9.14a) can be computed once for all when l =L, and then only L =27 expressions 
(9.13g), for \<l <L, need be computed to minimize the confidence interval for z at failure rate 

P- 

In the other extreme example, z is a single Gauss coefficient of the true core field: 
z =/3/"(a ). Then in (9.4b) gf = 5 ; f 5^ . One must take L > l , and then only the single subspace 
X [ ; ] need be considered. Clearly 
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||g [/) ||=0 (9.15a) 

lirmll =eoiD)\i+\y'\cia) M . (9.i5b) 


Then (9.1 3g) becomes 

\Kf,](p)\ =(l+\r\c/a) l+2 [(l.l)u +(3 ID)'*$v(y {l] ,p)] . (9.15c) 

If the crust can be treated as a two-dimensional random process on S(a), then the systematic 
error in (9.15c) is only the truncation error, (0.1)u instead of (l.l)w. The crustal error can be 
included in 5y with other random errors, and 6 in (9.15c) must be replaced by (0 2 -wc 2 u 2 )' /i where 
a > 1, and a = 1 only if the crustal statistics introduce no correlations between the <5y, at different 
measuring points (Langel, Estes, and Sabaka, 1988). In the most favorable case, a = l, and 
(9.15c) is replaced by 

l*f/j(p)| =(/+l)- , V/fl) /+2 [(0.1)« +(3 ID)'\e 2 +u 2 )\(y [n ,p )] . (9.15d) 


-H-+++++++++-H-++++++Table 2 near 


Table 2 gives \Kf, ] (p)\ and \Kf l] {p) \ in microTeslas. The values quoted in the table are for a 
failure rate p = 10 -4 . Using the failure rate p = 10 -2 does not change \Ky j (p) | by an amount 
sufficient to affect the table, while that larger failure rate lowers \Ky } (p) | by about ten percent. 
For comparison, Table 2 also gives the rms value (/?/"(£ ) 2 )„ of the Gauss coefficients of degree 
/ at the CMB, where 

<£/”(* ) 2 > m : = (2/+1) -1 £ gP(a ) 2 . (9.15d) 

The values of ((3 fia ) 2 )„ are from Langel and Estes (1982). In Table 2, u = 12 nT (Cain et al„ 
1988), 0 = 6nT (Langel, et ai, 1982), and D =26,500 (Langel and Estes, 1982). The random 
errors are assumed to be Gaussian, so v(y t/ j,p) is independent of and is given by (3.9) or 
Table 1. 

The implications of the computation leading to Table 2 are the following: the heat- flow 
bound permits truncation at L =27 with a truncation error of 1.2 nT in each measured component 
of B. Suppose a failure rate p = 10 -4 is acceptable. If the satellite data on a sphere of radius c at 
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altitude c-b =400 km are fitted by least squares with a model (2.1) truncated at / =A, and if 
(9.10c) is accepted as a bound on the crustal field at satellite altitude (or as its standard deviation 
if it can be viewed as random) then each Gauss coefficient on the CMB has a confidence 

interval A'f/j(p) or A'f/j(p) whose half-length is given by Table 2. These half-lengths do not 
depend onm. If the crustal error must be treated as systematic, column 3 of Table 2 is appropri- 
ate, and shows that Gauss coefficients with / >9 are not resolvable at the CMB. If the crustal 
error can be treated as random, with uncorrelated contributions at different measuring points, then 
column 4 of Table 2 is appropriate, and Gauss coefficients at the CMB become unrcsolvable 
when / £ 12. If only Gauss coefficients of degree </ are known, the circle of confusion on the 
CMB has a diameter of about 4(/+l)" 1 radians (Booker, 1969). For / =8 this is about 25° and for 
/ = 11 it is about 19°. 

Notice that the confidence interval specified by (9.15c) or (9.15d) does not explicitly 
involve the value q =3 x 10 17 nT 2 from (2.4). That value enters only in choosing the truncation 
level (9.1 If)- The relationship between q and the minimum acceptable truncation level which 
produces a truncation error no more than ten percent of the crustal error is (9.1 le). If q is 
changed from <7 j to <7 2 , L must change from L j to L 2 where 

L 2 -L 1 = log(<? 2 /<? 1 )[21og(c/a)r 1 • (9.15f) 

If q is increased by a factor of 3.8, L must be increased by 1. If q is increased by a factor of 
14.2, L must be increased by 2. Clearly the conclusions of CSI in this case are quite insensitive 
to the numerical value of q in (2.5). What is important is that the quadratic form Q (x, x) is 
known, and that the bound q is known to within a few orders of magnitude. 
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10 CONCLUSIONS 

The uniqueness question in geophysical inversion is the question of inference: to predict from D 
observed numerical properties of the earth, y \ 0) , not only values but error bounds for P 

other numerical properties of the earth, z j, .... z P . Except in very special cases, infinite dimen- 
sionality of the model space X makes the uniqueness problem insoluble without prior informa- 
tion to supplement the data. Such information is available from laboratory experiments, physical 
theory, and other studies of the earth. Even if the prior information is not certain, accepting it as 
a working hypothesis is a useful way to approach the data. 

One particular kind of prior information is a quadratic bound (1.4a) on the correct earth 
model, usually a bound on energy or dissipation rate. In earlier studies, such prior information 
has been "softened" to a probability distribution p x on the model space X, so that stochastic 
inversion (SI) or Bayesian inference (BI) could be used to incorporate the prior information into 
the inversion. Recent work (Backus, 1988a) has shown that this "softening" process always adds 
spurious new prior "information" not implied by the original inequality (1.4a). Nevman’s (1937) 
theoiy of confidence sets provides a technique, confidence set inference (CSI), for directly incor- 
porating unsoftened hard prior bounds into the data inversion without generating spurious infor- 
mation. The constant q in (1.4a) is usually known only to within a few orders of magnitude, so 
an essential part of CSI is the analysis of the sensitivity' of its conclusions to variations in q . 
With CSI the uncertainty in q cannot be dealt with by "softening" (1.4a) to a probability distribu- 
tion, as is done in BI and SI. 

CSI requires that the formal description of the linear inverse problem include the following 
ingredients, all explicitly known to the observer: a linear model space X , a positive-definite qua- 
dratic form <2(x,x) onX, a data space Y =R° , a prediction space Z =R P , a linear data function 
F :X —*Y, a linear prediction function G :X — »Z, a subset S 5 c Y, a subset S z qZ, and an m- 
parameter family of probability distributions p ( 6 , •) on Y, where 6 =(9j, ...,0 m ) is the parameter 
vector. All the p(6,-) must have as their domain the cr-ring Ly of Lebesgue measurable subsets 
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of Y . The linear functions F and G must be continuous (and therefore bounded) in the norm 
defined on X by ||x|| = Q (x, x) 14 . 

To use CSI, the observer needs six more objects: p E , 8 y, 0 0 , x, tj and f. These may be 
unknown, but the observer must know that they exist and that they have the following properties: 
p E is a probability distribution on Y with domain I y ; 8 y = ( 8 y It ..., 8 y D ) is a vector drawn at ran- 
dom from Y according to the probability law p E \ and 


6 0 e R m , (10.1a) 

x e X , (10.1b) 

77 e S Y , (10.1c) 

C e S z , (lO.d) 

Q (x, x) < 1 , (10. lc) 

Pe=P@ o.). 00.11) 

y (°) = F(x)+i7 +6y , (lO.lg) 

z = G(x) + £. (lO.lh) 


In (lO.lg), y (0) is the observed data vector (y j (0) , ...,y^ 0> ) e Y ; and in (10. lh), z is the desired pred- 
iction vector ( 2 1 , .... z P )eZ. The vectors 77 and £ are the systematic errors in modelling the data 
and predictions, and 8 y is the random error in the data. The vector x is the ’’ correct" earth model, 
and 60 is the "correct" parameter vector describing the probability distribution of the random 
error vector 8 y in T. Neither x nor<? 0 need be unique. 

The final result of CSI is to produce for eachp e (0, 1] a "confidence set” K z (p)^Z such 
that either 

z eK z (p) (10.2) 

or an event E has occurred whose probability is no more thanp. The statement (10.2) is said to 
hold at confidence level 1 -p, or with failure ratep. Section 3 describes a rudimentary calculus 
which keeps track of the failure rates of the statements in any chain of argument whose 
hypotheses have known upper bounds on their failure rates. This calculus of failure rates greatly 
facilitates construction of the confidence sets K z {p). Unlike BI or SI, CSI gains nothing by per- 
mitting dimZ > 1, so in the present paper Z is always the real line R , G :X —*Z is always a 


September 19, 1988 



George E. Backus 


Confidence Set Inference 67 


functional g : A' — and K z (p) is a subset of R. Therefore the size \K z (p)\ is easy to define, 
and here is taken to be half the diameter of K z (p). Usually K z (p) is an interval, and \K Z (p)\ 
is half its length. Many subsets K z {p)QR will make (10.2) correct with failure ratep, and one 
goal of CSI is to find a K z {p) for which | K z (p)\ is small enough to make the estimate of the 
prediction z useful. 

Section 5 shows how to construct K z (p) in the simplest special case of (10.1), in which p E 
is known, dim A' <dim J 7 , and F :X — » Y is injective. In this special case, (lO.le) is not needed, 
and the continuity of F and g is an automatic consequence of the fact they are linear and that 
dimX <°c. The construction of K z (p ) leans heavily on the fact that p E provides a natural dot 
product on Y, the only dot product under which the variance tensor of p E is the identity tensor on 
Y. When S Y and S z are balls or parallelipipeds centered on the origins in Y andZ, K z (p) is an 
interval whose center, is g (x^) where is the model in X which best fits the observed 
data vector in the sense of least squares, weighted by the inverse of the variance matrix 

£[(4&Xfy)]- 

When dimX =«, (lO.le) definitely is needed, and the dot product Xj □x 2 =Q(x 1 ,x 2 ) for 
xi,x 2 e X plays an essential role in the argument. Continuity of F and g is not automatic, and 
any Q which makes F or g discontinuous is too weak to resolve the nonuniqueness inherent in 
trying to make infinite-dimensional inferences from finitely many data. When F is discontinu- 
ous, the observer has no option but to seek a stronger Q . When g is discontinuous, a second 
option is available: to replace g by a continuous g A which resembles g closely enough that the 
new prediction, z A , is an acceptable substitute for z . 

When dimX =<» and p E is known, CSI proceeds in two steps. The first (section 6) pro- 
duces a finite -dimensional approximation to (10.1), and the second (section 7) produces an injec- 
tive approximation to the first, thus subsuming the approximate problem under section 5. In the 
first step, the observer chooses any finite N and any //-dimensional subspace X N of X. Then 
(lO.le) provides an N -dimensional approximation to the original infinite-dimensional problem. 
The new model space is X^^=X N , the new data function is F (0yV) =F | X^y and the new 
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prediction functional is g(o^i)-g I X (0 The approximation process adds a "truncation error" 
to each of the systematic errors rj and C in ( 1 0. 1 ), and so changes S Y and S z to larger sets S Y N >oo) 
and «), but otherwise leaves (10.1) unchanged. One possible choice oiX N is N p, the orthog- 
onal complement of the null space of F. This choice has two advantages: its data function F (0iAI) 
is automatically injective, so section 7 can be bypassed; and it may or may not add a truncation 
error to £ in (10- Ih), but never adds a truncation error to ij in (10. lg). Thus it assures 
S(N,~) ~S Y - The disadvantages of the choice X F =N p are that it requires manipulation of D xD 
full matrices ( D = 26,500 for many MAGSAT studies) and that the \K z (p)\ it produces may be 
unnecessarily large by several orders of magnitude. If the data are relevant to the predictions, 
usually there will be an X N with N « D which produces much tighter error bounds on z than 
does N/k 

Section 7 describes the second step in approximating (10.1) by a finite-dimensional prob- 
lem. This step depends on computing the eigenstracture (singular value decomposition) of 
> so dot products on both X^y and Y are essential. If0i£0 2 - * * ‘ ^ <Pm >0 
are the nonzero eigenfactors (singular values) of F (0>W ), and if {$!,... ,£ w ) is a corresponding 
onhonormal eigenbasis for Y dF^), with cobasis {5v — > $ M ) , then for i = 1, ...,M , 

. O0.3a) 

while if xe Pi {xj .-.x^} 1 then F(x)~Q. The tensor F which 
corresponds lo F^y.X^^ — tY is 

M 

F ((W = X Qi Si ■ (10.3b) 

i«l 

Let X (o^i ) be the subspace of Y qF ( 0iN ) spanned by ), and let F^y.-F | X ). If 

1 <n<M , then F Q^y.X F (X is an injection, so section 5 produces a confidence set 
X ("o,n ) (P ) ■ For each n, 

z e /srfo,n)(p) (10.30 

with failure rate p. Now the observer can choose the n in 1 <n<M which minimizes 
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\K% A ){p)\. If the data are relevant to the predictions and X N is large enough to give a good 
approximation to the original problem (10.1), then this optimal n will be neither 1 nor Af, and 
must be found by numerical computation. In the approximate inverse problem whose data func- 
tion is the model components £,■ ox with j <n are estimated from the data 

vector y (0) , and provide an estimate z^j for the prediction z. The components with j > n are 
not used in estimating z ) . All the model components with 1 <j<N contribute to 
\K% fl) {p) | , the error estimate for z. The contributions of the components with j <n arise from 
the random error 8y and the systematic error T] in the data vector, while the contributions of the 
components with j>n come from the prior quadratic bound, (lO.le). If n is too small, useful 
data are being discarded, and if n is too large, some data are being used whose errors are so large 
that better error control in z is provided by the prior quadratic bound on x. 

If p E is not known, it can be estimated from the data as long a sm+n «D . This problem 
is not examined here for m > 2, but the case m = 1 is studied in one common situation: the vari- 
ance matrix E [(£>’,• X^A/Xl is supposed to be an unknown multiple # 2 Vj of a known D xD matrix 
Vj. The result, which seems likely to generalize to any m «£> - n, is that ifp is not too small 
relative to (D -n )~' A in the sense of (5.8a), then a confidence set AT e (p x ) for# can be found such 
that p j «p and yet K e (p j) is so small that every 6 g K & {p j) produces nearly the same p (#,-). 
Then for any 6 e AT e (pi), p E can be assumed equal to p {&,-), and sections 5, 6, and 7 can be 
invoked for this known p E . 

The problem of geomagnetic modelling at the CMB illustrates CSI. The data are measure- 
ments of Cartesian components of the geomagnetic field B at points r,- on or above the earth’s 
surface. The model space X is the set of all B outside the CMB which can be produced by elec- 
tric currents inside it. Both the energy bound (2.3) and the heat-flow bound (2.4) make the data 
function F : X — » Y continuous. If the prediction functional g :X ->R is defined by choosing a 
fixed r on the CMB and requiring g(B)-B r (r) for every B g A', then both the energy bound and 
the heat-flow bound make g discontinuous (unbounded). Therefore neither bound enables the 
observer to estimate B r (r ) on the CMB from the data. If g(B) is the unweighted average of B r 


September 19, 1988 



George E. Backus 


Confidence Set Inference 70 


over a disc on the CMB, the heat- flow bound makes g continuous and g(B) observable (i.e., 
estimatable from the surface data), while the energy bound makes g discontinuous and g(B) 
unobservable. If g (B) is the disc average weighted by a tent function, both prior bounds make g 
continuous and g (B) observable. 

Among the many choices for X N which have appeared in the literature of geomagnetic 
modelling, section 9 examines only the oldest, X N =X } , the space of models (2.1) whose Gauss 
coefficients /3/"(a) vanish for / >L. Then N =dimX IZ ,] = L(L+2) and F^tsF^^FIA'^j. 
To avoid numerical computation in finding the eigenstructure of F^j, some simplifying assump- 
tions are accepted. The data consist of all three Cartesian components of B measured by satellite 
at D/3 positions r ( - on a single spherical surface 5(c) of radius c =6771 km. The positions r,- are 
nearly unifoimly distributed on 5(c), and only truncation levels L are considered for which 
2L+1 « (D/3)' a , so the necessary sums over the r, can be approximated by surface integrals over 
5 (c ). The random errors in the D individual component measurements of B are assumed to be 
independently and identically distributed as Gaussians with mean 0 and variance 0 2 . Then an 
orthonormal eigenbasis for F [L] in X [L] consists of the magnetic fields with a single non- 
vanishing Gauss coefficient, and the data vectors generated by these fields are the cobasis. The 
eigenfactors for F^] are calculated analytically to yield explicit formulas for |F z (p)| . Table 2 
shows the results when the satellite data are used to predict the Gauss coefficients /3/"(a) at the 
CMB. In that table, MAGS AT parameters are used: D = 26,500; 6 = 6 nT; and a systematic rms 
error u = 12 nT is produced in the data by the crustal field. The possibility is also considered that 
the crust might be amenable to treatment as a two-dimensional random process, so that u 
becomes a standard deviation rather than an error bound. The truncation level L =27 is chosen to 
make the truncation error no more than 1.2 nT. The actual value of q in (2.4) enters only through 
this truncation level. If q were 14 times its value in (2.4), the truncation level would have to be 
L =29. Thus the conclusions are quite insensitive to q. Table 2 is calculated for a failure rate 
p = 10 -4 , but relaxing this top = 10" 2 would not affect the table entries for the systematic crust 
and would decrease those for the random crust by ten percent An improvement of accuracy by a 
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factor of about 8 is produced if the crustal error can be treated as random rather than systematic. 
The smallest circle of confusion in observing B r on the CMB has diameter about 26° for a sys- 
tematic crustal error, and about 19° for a random crustal error, corresponding to highest resolv- 
able degrees of / = 1 1 and l = 8 respectively. 
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APPENDIX A. NOTATION FOR SET THEORY 

The notation for set theory will be that advocated by MacLane & Birkhoff (1967) and Birkhoff & 
Bartee (1970). For any objects, u , v , w , { u , v , w } denotes the set whose members are u , v and 
w, while (u,v,w) is the ordered triple. Thus {«,v,w} = {v,h\k} but (u,v ,w)*(v ,w,u). 
The null set, the set without members, is written 0. If X is a set, U cX means that U is a subset 
of X , i.e., every member of U is a member of X ; and x e X means that * is a member of X . Thus 
u e [u, v,w ) and {w ,u } c [u , v,w ), but (u , v) e. {«, v,w ) and (u,v)s£{k,v,w}. The sym- 
bol "x e X" also abbreviates the noun clause "x which is a member of X." If P [x] is a statement 
about x, such as "x is an odd integer," then {x\P[x]} is the set of all x for which P [x] is true. 
The symbol means "is defined as.” If X and Y are sets, then X n 7 := {u : u e X and Kef), 
Xuf:=(ic«eX or u e 7 or both}, X \ Y :={«:«€ X and u e 7}, and 

X xY := {u :u = (x,y) and x e X and ye 7}. This last definition can be abbreviated 
X xY := {(x,y) :x e X and ye 7}. The sets X nf, X uY, X \ Y andXxY are called the 
intersection, union, difference and Cartesian product of X and Y. 

Suppose X and Y are sets and F is a function which assigns to each x e X a value 
F(x)e Y. Then one writes "F :X ->7," usually read as "F maps X into 7." The symbol 
”F :X 7" also stands for the noun clause "the function F which maps X into 7." The sets X 
and 7 are the domain and codomain of F. Two functions F and G are equal iff (if and only if) 
they have the same domain X, the same codomain 7, and F(x) = G(x) for every x eX. When 
F can be given by an explicit formula, another notation is available. For example, if R is the set 
of real numbers and F :R —>R is defined by requiring F(u) = u 2 +3u for every u e R, then 
"F :X — >7" can be replaced by "F :u(-»u 2 + 3m," or simply by "«Kk 2 +3u." If P[x] is a 
statement about x, and F :X -» 7, then the set (y :y =F(x) for at least one x eX which makes 
P [x] true} is abbreviated as {^(x) :x e X and/’fx]}, or simply as {/ r (x):/ > [x]}. 

If F :X -4 7 and U cX, then F(U) stands for the set {F(u) : u € U}. The set F(X) is the 
"image" of F . If U c X , then F [ U denotes the function F :U ->F(U) such that F(u)=F(u) 
for every u g U . The function F \ U is called the "restriction of F to U ." Its image coincides 
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with its codomain. 

If F :X -+Y and G :Y —>Z, then the "composite" of G with F is the function 
(GoF):X -4 Z defined by requiring that for each x e X 

(< GoF)(x)’.= G(F(x )). (A- la) 

If also H :Z -4 W, then (A. la) implies 

Ho(GoF) = (HoG)oF . (A. lb) 

The identity mapping on X is the function I x :X ->X such that I x (x)=x for every' x e X. 
If F :X -4 Y, then clearly I r oF = F =FoI x . 

If F :X -4 7 and F(X)=Y, then F is a "suijection." If F(x)=F(x) always implies x =x, 
then F is an "injection." If F has both properties, it is a "bijection of X to Y ." If F :X — » T is a 
bijection, then it has an inverse function, F _1 :7 ->X, defined by requiring that for each y e Y, 
F -1 (y ) is the unique x e X such that F(x)=y. 

A function G :Y ->X is a "left inverse" of F :X -4 Y if GoF = I x , and a "right inverse" of 
F \i FoG =I Y . If F :X -4 7 has a left inverse, then F is an injection; and if F has a right 
inverse, then F is a suijection. If F :X -> Y has both a left inverse G : Y -*X and a right inverse 
H :Y - 47 , then F is a bijection and G =H =F -1 . For example, G =GoIy =Go(FoF -1 ) = 
(GoFjoF -1 = I x oF~ l =F-\ 

R will always denote the set of real numbers. If a and b e R then a /\b is the smaller of a 
and b , while a v£> is the larger. The four kinds of intervals are 

[a,b] :={«:« e R and a <u <b), (a,b):= [u :a <u <b), [a,b):= {u :a <u <b), and 
(< 2 ,Z>):={u :a<u<b). Suppose 7 is a real vector space, N is a positive integer, and for each 
i e {1, a, e R and F,- : X -4 7 . Then the linear combination, the function 
(XlL\ fljF,) ; X -4 7, is defined by requiring that for each x eX 
' N ' N 

2«iF/ (x):= Efl.-FiCO. (A.2a) 

l‘=l J ‘=1 

If both X and 7 are vector spaces, a function F : X -4 7 is called "linear" if for every positive 
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integer/^ and any a j e R anc * x n — »xjv e ^ 


N 

Z «.-*i 

i=i 


= Z 

1=1 


(A.2b) 


Any linear combination of linear functions F, :X — »y is itself linear. Since R is a one- 
dimensional real vector space, the definitions and remarks of the foregoing paragraph apply to 
functionals (functions with codomain R ). 

If X, Y and Z are real vector spaces and T:X xf -»Z, then T is "bilinear" if T(x,y) 
depends linearly on each of x and y when the other is fixed. Linear combinations of bilinear 
functions are defined by (A.2a) and are themselves bilinear. 

Suppose U and V are subsets of real vector space X , and A and B are subsets of R . Then 
AU + BV := [au + bv :a e A,b e B.ue U and ve V) . (A3a) 


If c e R and x&U then 


cU := {c }U , (A3b) 

- U := (~l)U , (A3c) 

x + U := {x} + U , (A3d) 

Ax:=i4{x) . (A3e) 


In particular, of/ = {0} and \ U - U . Notice the distinction between U -V and U \ V. 

A cr-ring (Halmos, 1950) is a non-empty set X whose members are subsets of another set X. 
To qualify as a cr-ring, X must have two properties: (1) P and Q e X implies P \ Q e X; (2) if all 
sets of the countably infinite sequence Q Q 2 , Q 2 , • • • are members of X, then so is their union, 


Q : vQ 2 uQ 2 v • • ■ (abbreviated as u ( „i (2< )• 

A measure fi on a set AT is a function whose domain is a cr-ring X of subsets of X, whose 
codomain is R+, the set of non-negative real numbers together with +°°, and which satisfies 


;=i 


(A.4a) 


whenever Q t e. X and <2, nQj =0 for all i and all j *i. Property (A.4a) is called "countable 
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additivity,” and the members of I are then called the /x -measurable subsets of X (Halmos, 1950). 
If the whole set X is /J. -measurable and if 

/x(X) = 1 (A.4b) 

then /z is called a "probability measure" or a "probability distribution" on X. The /z -measurable 
subsets P and Q of X are then called "events." P is called "the event P ," "the event that x e P 
or "the event that P happens." Thus jx x (P ufi) is "the probability that at least one of? and Q 
happens," while fL x (? r> Q ) is "the probability that both ? and Q happen," and n x (? \ Q ) is "the 
probability that ? happens and Q fails." Clearly 

H X (P vQ)ZHx(n+»x(Q) (A.4c) 

and 

/x* CP n Q ) <nx (f ) a/x x 02 ) • (A.4d) 

If p. x and Hy are probability measures on sets X and Y, their cr-rings being I. x and ly, then 
the product measure fd Xx y on X xY is defined as follows. Its a-ring is the intersection of all the 
cr-rings of subsets of X x Y which include as members all the sets ? xQ with Pel* and 
Q eZy. For such sets, 

Mxxr( p xQ):=^x( /> )/ i r(!2) (A.5) 

and (A.4a) permits the calculation of fJ- x *y(W) for any other set W € I XxX (Halmos, 1950). Set- 
ting P -X and Q = Y in (A.5) shows that ti XxX is a probability measure on X x Y . In this con- 
text, if Pel* and Q e Y. y , then the event P xY is called the event P , and the event X x Q is 
called the event Q , so (A.4c,d) remain true. The probability that at least one of events P and Q 
happens is no more than the sum of their probabilities, and the probability that both P and Q 
happen is no more than the smaller of their probabilities. 
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APPENDIX B. REAL HILBERT SPACES FOR MODELS AND DATA 

In linear inversion, the model space X is a real linear space (vector space; Halmos, 1958), and its 
dimension, dim X , is infinite. A quadratic inequality like (2.3) or (2.4) induces on X a natural 
dot product, which makes X a pre-Hilbert space. Then (Halmos, 1951) X can be completed to a 
Hilbert space. On the data space Y , the probability distribution for the random error vector 8y 
induces a dot product. Since £> = dim T < <*>, Y is automatically complete, and so a finite- 
dimensional Hilbert space. These natural dot products on X and Y will be constructed in this 
appendix. 

On X , a quadratic inequality like (2.3) or (2.4) produces a symmetric, positive-definite bil- 
inear functional Q . This Q assigns to any pair (x, X) of models in X a real number Q (x, X) such 
that 


£2(x,X) = !2(x,x); (B.la) 

Q(x,x)>0 (B.lb) 


if x * 0; and for any positive integer N , real numbers a j a N , and models x 0 , x j, ..., x N , 

N N 

Q(xq, £ c/x,)= Z a,!2(xo.x 1 ). (B.lc) 

;« i i=i 


Q is "bilinear" because (B.la) and (B.lc) imply that f2(x,X) is also linear in x when X is fixed. 
The observer’s prior information is the knowledge of a real number q such that the correct earth 
model x must satisfy Q (x, x) < q , or 

<7 -1 i2(x,x) < 1 . (B.2) 


For example, in (2.3) if )3/ m (a ) and j5/ m (o) are the Gauss coefficients of the magnetic fields B and 
B at the CMB, then 


q~'Q( B,B) = (2xl0 3 T 1 Z Z 

/ =1 m=~l 


■\33\-l 


/ 


2/+1 


/+1 


prwfirw- 


On X , define the inner product or dot product x dx by 
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xdx:=<7 '(2(x.x) (B.3a) 

and the length ||x || by 

||x|| :=(xdx) h (B.3b) 

The notation v • v will be reserved for use with the ordinary real three-dimensional vectors. On 
being completed in the norm ||x ||, X becomes a Hilbert space. The investigator’s prior quadratic 
information (B.2) can now be written 

11x11*1. (B.4) 

In geomagnetism, the simplicity of the recursion relations for complex spherical harmonics 
sometimes makes it useful to consider a complex model space X *, one whose scalars are com- 
plex. A real Hilbert space X can always be extended to a complex Hilbert space X \ The 
members of AT* are the formal symbols x+t x where x and xe X and i =(— l)* 4 . Linear combina- 
tions with complex scalars are defined in the obvious way using i 2 =- 1, and the complex conju- 
gate of x+j'X is (x + ix) c =x-j'X. If Xj, x 2 , Xj, X 2 e X, the dot product of Wj =xi + j'xi and 
w 2 = x 2 + / S 2 is defined as Wj o w 2 := Xj □ x 2 - Xj □ X 2 + / [Xi □ x 2 + Xj □ x 2 ] and the inner product 
of Wj and w 2 is (wj| w^ := wj C dw 2 . Complexifying X in this way introduces a distinction 
between inner and dot products which engenders complications in the notation, so only vector 
spaces with real scalars will be admitted here as model or data spaces. 

On the data space Y , the inner product comes via integration from p E , the probability distri- 
bution of the random error vector Sy. As usual, integrals of real-valued functions of Sy with 
respect to p E will be written as expected values. Thus 

E [fy t ] := J dp E (y)y,- (B.5a) 

Y 

E KSyi X8yj )] := J dp E (y)y,y y . (B .5b) 

Y 

If p E is known and £[5;y ( -]*0, then its known value can be subtracted from both sides of (1.1). 
This redefines y (0) and 5y in such a way that 

£[<5y,0 = 0. (B.6a) 
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If p E is being estimated from the data, £[<5y ( ] will be estimated. Alternatively, p E can be 
parametrized so that £[<5y, ] = 0, and the necessary correction can be added to r\ t or £, (x) and 
subtracted from 8y t in (1.1a). This slightly increases the bound on tj or the amount of informa- 
tion carried by the models in X, and again achieves (B.6a). Henceforth, (B.6a) will be accepted. 
Then the D xD variance matrix V of Sy has entries 

V, y =£[(5>' )(<5y ; )] . (B.6b) 

This variance matrix is positive semi-definite. If it is not positive-definite, then, with probability 
1, <5y lies in a subspace U of Y (Cramer, 1946). Then there are only dim U linearly independent 

linear combinations of 8y x ,..., 5y D . The corresponding linear combinations of y D can be 

taken as the new data, and U is the new data space. The remaining D -dim U linear combina- 
tions of y have no random error. In the real world, this must mean that their values are 
obtained from theory rather than measurement. They are not really data, and can be discarded. 
(Usually, they will all vanish.) Henceforth, the matrix V of (B.6b) will be supposed positive- 
definite. 

Now let W := V 1 . The D xD matrix W is the "weight matrix" generated by the random 
errors. It is symmetric and positive-definite. On the data space Y define the dot product of 
y = (y Yd ) and $ = (y x y D ) as 

y □ y : = X yiWijSj. (B.7) 

< j=i 

This dot product measures the data in units of their random errors, so it is a dimensionless pure 
number. It will now be shown to have the property that for any fixed y and y in Y 

E [(y o5y) (5y ciy)) = y o y . (B.8) 

Conversely, no other dot product on Y has the property (B.8). That (B.7) does imply (B.8) fol- 
lows immediately from (B.6b) and the definition of W. For the converse, suppose that yo y is 
any dot product on Y . There will be a symmetric, positive-definite D xD matrix U such that 

y o y = x y> u u yj ( B - 9 ) 
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(Halmos, 1958). If the dot product (B.9) has property (B.8), then 

D D 

Z yi u u v jk u u Si = z >’<• u u Si • 

i,j.k,l=\ i,/=l 


Since y and f are arbitrary, it follows that UVU = U . Since U is positive-definite, U~ l exists, so 
UV = I , where 1 is the D xD identity matrix. Hence U = V 1 = W, so (B.9) and (B.7) agree, 
and yoy = yoy. 
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APPENDIX C. TENSORS AND LINEAR MAPPINGS ON HILBERT SPACE 

This appendix is a self-contained list of the facts and sometimes slightly unconventional Hilbert 
space notation used in the present paper. Proofs of all assertions can be found in Halmos (1951), 
Dunford and Schwartz (1958), or Lorch (1962). Some of the shorter proofs are given here to aid 
the exposition. 

A linear mapping F :X —>Y from Hilbert space X to Hilbert space Y is "bounded" if there 
is a real number K such that for ever)' x e X , 

I|F(x)|| </sT ||x|| . (C.la) 

Halmos (1951) shows that a linear mapping of one Hilbert space into another is bounded iff (if 
and only if) it is norm-continuous (i.e., lim ||x„-x||=0 implies lim ||F(x„)--F(x)|| =0). The 

n — ►«> « -*"> 

smallest K which works in (C.la) for all xe X is called the norm of F, written ||F||. Then the 
best possible version of (C.la) is 

||F(x)||<||F||||x||. (C.lb) 

A bilinear functional T : X x Y — > R is "bounded" if there is a real number K such that for 
each (x, y) e X x Y 

|T(x,y)| <LK ||x|| ||y|| . (C.lc) 

A bilinear T is norm-continuous iff it is bounded. The smallest K which works in (C.lc) for all 
(x,y) g X xY is called the norm of T and written ||T||. If X and Y are Hilbert spaces whose dot 
products are written with and if T :X xY —>R is a bounded bilinear functional, T(x,y) will 
often be written as xoTny. Thus - 

xoTny := T(x,y) , (C.ld) 

and the best possible version of (C.lc) is 

| x □ T d y | <||x!!||T|j llyll . (C.le) 

The real number xnToy depends linearly on each of x, T and y if the other two are fixed. 


I 


September 19, 1988 



George E. Backus 


Confidence Set Inference 8 1 


If X and Y are Hilbert spaces, the set of all bounded linear mappings F :X —tY will be 
denoted by BL(X — »y), and the set of all bounded bilinear functionals T:X x Y —>R will be 
denoted by X ® Y. The members of A' ® Y are "tensors over X x Y If linear combinations are 
defined by (A.2a), both BL(X -> Y) are X ® Y are real vector spaces. In fact, if N is any positive 


integer, and F, e BL(A' T), T, e X ® Y , and a, e R for each i e { 1 N), then 

II X a i Ft 11-2 1^1 (C.2a) 

i-t i-t 

II S T f || < X l«il IITJ. (C.2b) 

1=1 1=1 

Moreover, if Z is also a real Hilbert space, and G e BL(T — >Z), then 

||GoF||<||G||||F||. (C.2c) 

Inequalities (C.2b,c) follow immediately from the definitions of the norms, while (C.2a) is a 
consequence of two facts about any Hilbert space Y : if a e R and y, $ e Y then 

MI=M llyli (C.3a) 

and 

lly+yil ^llyll + lly|| • (C.3b) 

The triangle inequality, (C.3b), is itself an immediate consequence of the Schwarz inequality, 
lyoyi Sllyll llfll . (C.3c) 

which in turn follows from the fact that y' cty' £ 0 for all y' e Y (Halmos, 1951). 


When X and Y are Hilbert spaces, every function in BL(A' -4 Y) can be thought of as a ten- 
sor in Y ®X and vice-versa. This correspondence is useful both to simplify notation and because 
it permits any operation applicable to either a tensor or a linear mapping to be transferred 
immediately to the other. To establish the correspondence, let F e BL(A — » T), and define 
T F : Y xX — » R by requiring that for each (y,x)e Y xX 

T p (y,x) = y □ F (x) (C.4a) 
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Clearly T F is a bilinear functional on YxX. Moreover, by (C.lb) and (C.3c), 
|T f -(y,x)|^||y||||F||||x||,soT f € Y®X and ||1> || <||F||. In fact, 

I|T f ||=||F|| . (C.4b) 

To see this, let xj, x 2 , ••• be an infinite sequence of nonzero vectors in X such that 
lim ||F(x„)||/||xJ=||F||. Let y„ :=F(x„). Then T F (j- n ,x fl )=||yJ| ||F(x„)|| so 

n — 

lim |T F (y„,x n )|/|jxJ ||yJ=||F||. 

Therefore ||T F || *||F||. Since also ||T F || <||F||, (C.4b) is established. 

Equation (C.4a) is a rule which assigns to each bounded linear function F e BL(X -*Y) a 
tensor T f e Y ® X . That is, (C.4a) defines a function BL(X -> Y ) —> Y ® X such that for each 
FeBL(X->T), 

3(F) = T f . (C.4c) 

It follows immediately from (A.2a) that J is linear, and (C.4b) shows that $ preserves norms. 

To show that 3 is a bijection from BL(X xY) to Y ®X, an inverse for it will be con- 
structed. First, suppose that / e BL(T —»/?). Then (Halmos, 1951) there is a unique vector 
y f e Y such that for every ye Y 

f(y) = yoy f . (C.5a) 

Now suppose Te 1' ®X. For any fixed xeX , consider the functional / x : Y — >R defined by 
requiring that for each ye 7 

/ x ( y) = T(y,x). (C.5b) 

Denote this functional f x by T(-,x). Clearly f x e BL(y — »F), and in fact ||/ x || <||T|| ||x||. 
Therefore (C.5a) applies to / =/ x . In Y there is a unique vector yj t , or v X( . jX) , such that for every 
ye /x(y)=y a yT('. x )- Thatis 

T(y,x) = y ny T (-,x) . 
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Now define F T : X — > Y by requiring that for each x e X 

F t (x) = y T C, x) . (C.5c) 

Then F x is uniquely determined by the fact that for each xeX and ye T, 

T(y, x) = y d F r (x) . (C.5d) 

Finally, define f: Y ®X — ► BL(X — » Y ) by requiring that for each Te Y ®X, 

$(T) = F r . (C.5e) 

Comparing (C.4a) with (C.5d) shows that F =F T iff (if and only if) T = T f . That is, 
$:Y ®X — >BL(X -*Y) is both a left and a right inverse forjf:BL(X —>Y)—>Y ®X. There- 
fore, jTand .Tare both bijections, and each is the inverse of the other. Since J is linear, its inverse 
must be linear. The linearity of J can also be inferred directly from the uniqueness of the F T 
defined by (C.5d). Of course, (C.4b) now implies 

11^x11 = ||T||. 

Henceforth it will be convenient to confuse the bounded linear mapping F :X —>Y with the 
tensor T F e Y ®X, and to write T^ as F. Then, in the notation of (C.ld), 

ynFnx = ynF(x) (C.6a) 

for ever}' xe X and ye Y. If either F or F is given (C.6a) uniquely determines the other. Furth- 
ermore, (C.4b) can be written 

HF|| = ||F||. (C.6b) 

Equation (C.6a) suggests writing the vector F(x) as Fdx, so 

F □ x := F (x) . (C.6c) 

Then (C.6a) becomes 

y o F □ x = y o (F □ x) . (C.6d) 

The vector Fdx depends linearly on each of F and x when the other is fixed, and moreover (C.lb) 
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takes the Schwarz- like form 

HFdxII < ||F|| ||x!| . (C.6e) 

A special case of (C.6) is that in which Y =R . Then F is a linear functional / :X ->R, F 
is the corresponding vector \ f e X , and (C.6a) reduces to (C.5a). 

A first application of the correspondence (C.6a) between tensors and bounded linear map- 
pings will be to define the transpose of each. For Fe Y ® X , the definition of the transpose F r is 


simple: forever)' (y,x)e ^ xX, 

F r (x, y) = F(y, x) . (C.7a) 

Evidently F r e X ® Y , 

||F|| = |1F 7 '|| (C.7b) 

and 

(F r ) r =F. (C.7c) 

Moreover, ifF, G e Y ®X and a ,b e R , then clearly 

(aF + bG) T =aF T +bG T . (C.7d) 


Now suppose F e BL(X — > Y ). Then F J , the transpose of F , can be defined as the mapping in 
BUT —>X) corresponding to the tensor F r e X ® Y . Thus F J : Y — >X is the unique function 
with domain Y and codomain X such that for every (x, y) e X x Y , 

ynF(x) = xnF r (y) . (C.7e) 

Without a discussion equivalent to the introduction of tensors, it is not clear that a function 
F t : Y —>X with property (C.7e) exists at all. The tensor argument shows that a unique F T 
exists, is linear and bounded, and satisfies (C.7b,c,d). 

It will be useful to write F r (y) as y □ F. Thus, if y e Y and F e BL(A — » Y), then invoking 
(3.9c) gives 

ynF F T (y) = F T ay (C.8a) 
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The "dot product" y oF is bilinear in y and F, and (C.6e) and (C.7b) imply 

lly oF|| < ||y|| ||F|| . (C.8b) 

Equation (C.7e) can now be rewritten as yo(Fox) = xo(yoF), which is (yoF)nx. Thus, (C.6d) 
and (C.7e) now yield 

y □(Fo\) = (ycF)cx = yoFDX = xaF T Dy . 

The correspondence (C.6a) also provides a simple way to define symmetric and positive 
definite linear mappings. A mapping F e BL(X -*X) is symmetric if F T =F, and positive 
definite if xdF ox >0 whenever xe X and x*0. 

Composition is an operation more easily defined for functions and then transferred to ten- 
sors. Suppose that X, Y, and Z are Hilbert spaces and Ge Z ® Y, and Fe Y ®X. Then G □ F is 
defined as the member of Z®X such that for each x e X 

(GoF)qx := G n(Fnx) . (C.9a) 

If W is another Hilbert space and H e W ® Z , then (A. lb) implies 

H n (G oF) = (H □ G) □ F . (C.9b) 

The identity 

(GdF) t =F r oG r (C.9c) 

and the corresponding result for G and F come immediately from the following tedious but sim- 
ple application of the definitions and the various associative laws: for every x e X and z eZ, 
x □ (G □ F)^ - d z = zd(GdF)dx = zd[(GdF)gx] = zd[Gd(Fdx)] = (zdG)d(Fdx) = 
(Fdx)d(zoG) = (xdF 7 ) o(G r dz) = xd [F r d(G t dz)] = xn [(F t □G t )dz] = 
x □ (F r □ G 7 ) □ z. Then (C.9c) and (C.8a) imply that for every z e Z 

(zdG)dF = zd(GdF) . (C.9d) 

The identity tensor on X is defined as the tensor I x eX ®X which corresponds via (C.6a) 
to the identity mapping I x e BL(X —>X). To any (x,x)eX xX, I x assigns the value 
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I x (x,x) = x dI x dx = x dx . (C.lOa) 

Then evidently 

/J = / x . (C.lOb) 

If F e BL(X — »T) has an inverse, F _1 : Y — >X, then so does F r e BL(T — »A'), and 
(F 7 ) -1 = (F -1 ) 7 . (C.lOc) 

The proof is this. Since X and Y are complete, the Banach inverse theorem (Luenberger, 1969, p. 
149) says that F _1 e BL(Y —>X). The tensor in X ®Y corresponding to F -1 is written, of 
course, as F -1 . Now (F -1 ) 7 e BL(X — » Y) does exist, and (F -1 ) 7 oF 7 = (F oF -1 ) 7 = / 7 = I Y , 
while F 7 o (F -1 ) 7 = (F _1 oF) 7 = /J = / x . Thus (F -1 ) 7 is a right and a left inverse for F 7 . 
Hence (F 7 ) -1 exists, and satisfies (C.lOc) 

Another application of the correspondence (C.6a) is in the construction of dyads, which are 
very easy to define as tensors. Let X and Y be real Hilbert spaces and let xe X and ye T. 
Define the "dyad" Sy e X ® Y by requiring that for every x e X and ye Y 

x □ (xy) □ y = (x Dx)(y Dy) . (C.lla) 

The dyad xy is often written x ® y. Evidently 

(xy) 7 =yx. (C.llb) 

For any a e R and xe X, ax will also be written xa. With this convention, the mapping in 
BL(F —>X) which corresponds to the tensor Xy is determined by the fact that for each v e Y 

(Xy)ay = X(y°y) , (C.llc) 

and its transpose is determined by the fact that for each xe X 

x o (xy) = (x oX)y . (C. 1 1 d) 

There is still another pair of associative laws for dyads. Suppose X , Y, Z are real Hilbert spaces, 
and Fe Z ®X, and Ge Y ®Z, and xe X, and ye Y. Then for any ye Y, [Fci(xy)]oy = 
F □ [(xy) ny] = F □ [x(y □}’)] = (F Dx)(y □ y) = [(F cx)y]Dy. Therefore 
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F □ (xy) = (F nx)y . (C.lle) 

Similarly 

(xy) □ G = x(y □ G) . (C. 110 

It will be necessary to define the integral of a function whose codomain is a set of tensors, 
because variance tensors of probability distributions arc such objects. Suppose that G is an arbi- 
trary set and p is a probability distribution (measure) on Cl (see appendix A and Halmos, 1950). 
For a large class of functionals / :Cl->R, called the p -intcgrable functionals, the integral 

J dp (co)f (co) is defined, and is usually written as an expected value, 

n 

E p [f (<cw)] := J dp (co)f (co) . (C.12a) 

n 

Now suppose that X and Y are Hilbert spaces and T : £2 -» X 0 Y . For each fixed (x, y) e X ® Y , 
£»|-*XDT(Gt))Dy defines a functional on Cl. Suppose that for each (x,y) e X xY this functional 
is p -integrable. Then it is possible to define the integral of T(co) with respect to p , written either 

J dp (co)T(co) or E p [T(©)] . 
a 

By definition, E p [T(co)] :X xY —>R, and this functional is defined by requiring that for each 
(x,y)e X xY 

E p [T(ct))](x,y) := E p [xnT(£») D y] . (C.12b) 

Clearly E p [T(o)] is bilinear in x and y. If it is also bounded, then E p [T(tu)] e X 0 Y , and the 
function T.: Cl ~^X ® Y is called p -integrable. For example, if the functional a K ||T(oj)|! is p - 
integrable, so is T. If T isp -integrable, then the definition (C.12b) of its integral can be rewritten 

xB£ p [T(m)]ny := £,[x Q T(<o)oy] . (C.13a) 

One consequence of (C.13a) is that if IF and Z are also Hilbert spaces, and P€ W ®X and 
Q e Y ®Z, then P □ T and T dQ are integrable, and 

FaE p [T(co)] = E p [PaT(co)} (C.13b) 
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and 

E p [T(©)] □ Q = [T(ftf) □ Q] . (C.13c) 

In the special case of (C.12) which arises in CSI, £2 is the data space Y, p is the probability 
distribution p E of the random error vector Sy, and X is either R or Y . Following the convention 
introduced in (B.5), E pi [T(y)] will be written £[T(<5y)]. Then £[<5y] is defined as a member of 
R ® Y, i.e., of Y, and clearly (B.6a) implies 

£[<$y] = 0. (C.14a) 

The variance tensor of Sy is defined as 

V := £ [(<5y)(<5y)] , (C.14b) 

because of (C.14a). From (C.13a), if y and ye f then 

y □ Vo5 > = ^[(yo5y)(5yDy)] • 

Therefore, from (B.8), 
ynVo5> = yo?- 
Then, from (C.lOa), 

V = I r . (C.14c) 

Thus the dot product (B.7) is the only dot product on Y which makes V, the variance tensor of Sy, 
equal to the identity tensor on Y. If p E is Gaussian, then adopting on Y the dot product (B.7) 
gives Pe the density' function 

dp E (y) = (2tf )~ D 1 1 y| \ 2 )doy (C.15a) 

where da y is the volume element in Y defined by the dot product (B.7). If some other dot pro- 
duct, yo J’, is adopted for Y, then V is defined by (C.14b) and V*Iy. A Gaussian p E will have a 
density function given by 

dp E (y) = (2 k)~ d ,7 (det \T*exp(-V6y o V" 1 oy)d?y (C. 1 5b) 

where dPy is the volume element in Y defined by the dot product o. 
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APPENDIX D. THE EIGENSTRUCTURE OF A BOUNDED LINEAR FUNCTION WITH 
FINITE DIMENSIONAL CODOMAIN 

When the model space X and the data space Y are Hilbert spaces, and the data function 
F :X -+Y is linear, section 7 describes the calculations needed to optimize resolution in 
confidence set inference. Those calculations depend on the eigenstructure of F (its singular value 
decomposition). That eigenstructure is determined by the facts that 

dim Y <oo (D.la) 

and 

\\F\\<~. (D.lb) 

The algebraic discussion in Golub and Van Loan (1983, p. 16) suffices for computations, but the 
geometrical discussion given here may be a useful supplement for readers who, like the author, 
find it easier to think geometrically. 

The geometrical discussion is based on orthogonality. If X is a real Hilbert space and 
x,xeX, and if xnx=0, then one writes "x±x", read "x is orthogonal to x." A set 
{£j, ^ 2 > • • • } C-X is "orthonormal" if 

=5y (D.2) 

(5y = 1 if i—j and <5, y - =0 if i * j). Every orthonormal set is linearly independent. 

If xe X, U cl and V cX, then x±U means that xiu for every ue U, and U ±V 
means that u 1 v for ever)’ ue U and v e V. Define 

xoU := (xdu:ug U) 

and 

U oV := (uDv:ue U and veV). 

Then xJLU iff xoU = (0), and U 1 V iff U □ V = {0). 
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If xg X and also xlA’ then xnx = 0, so x = 0. The most useful corollary of this observa- 
tion is that if Xj and x 2 e X and Xj dx = x 2 dx for all x e X, then (xj -x 2 )_LA' so Xj =x 2 . 

If U szX and V cX and U ±V, then U + V is written U © V . Every vector in U © V can 
be written in exactly one way as u + v with u e U and v e V . To see this, suppose u, u' g U and 
v, v' e V. Then (u-u')d(v'- v) = 0. If also u + v = u' + v', then u-u' = v'-v = 0. 

If U cl, then 

(/ 1 := (x:xeA and x JL E7 } . 

The set U 1 is the "orthogonal complement" of U . It is always a subspace of X , even if U is not, 
and it is closed in the sense that if X],x 2 , • • • e U 1 and lim ||x„ -x|| =0 then xg U x (Halmos, 

1951). Clearly U LU 1 so 

U c (U 1 ) 1 . 

If U is itself a subspace of X , then (Halmos, 1951) 

U c - (U 1 ) 1 (D.3a) 

where U c is the closure of U . If U is a closed subspace of X , then U c - U so 
U = (U X ) L ; (D.3b) 

in this case 

X = U eu x . (D.3c) 

In particular, (D.3c) holds if dim U <°°, because every finite-dimensional subspace of X is 
closed. Equation (D.3c) means that for each x e X there are unique vectors x v and x^ such that 
x v g U , xjj g U 1 , and 

x = x v + xfr . (D.3d) 

Of course it is also true that 

x V DXu=0. (D.3e) 
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If U is a closed subspace of X , (D.3d) permits the introduction of functions P v :X — >X 
and Qu :X —>X defined by requiring that for each x e X 

Pu(x) = *u (D.4a) 

Qu ( x ) = x u ■ (D-4b) 

The function P v is called the orthogonal projector of X onto U . It is a suijection onto U, and 
p u | u =i u . Clearly Q v is Py. From the uniqueness of \jj in (D.3c) it follows that Pa is 
linear. From (D.3d) it follows that 

l|x|| 2 = ||x t/ || 2 + ||x^|| 2 , (D.5) 

so Hxj/ll <||x||, and hence li/’t/ li ^ 1. Evidently 

ll^c/ll = 1 if (D.6a) 

and 

||j2t/ll = l if U*X. (D.6b) 

Also, from the definitions, 

Pu + Qu-h (D6c) 

P u oP u =P u (D.6d) 

Qu o Qu = Qu (D.6e) 

Puo Q u =Q u oP u =0. (P.6f) 

Moreover, if x and xe AT, then \ DPy(x) = xo% = x^ □ £(/ = xy = Pa(x)oi = \aPu(x), so 

Pl = Pu- (D.6g) 

Suppose U is a finite dimensional subspace of X. Let dim U -N <°®, and let 
be any orthonormal basis for U (many can always be found; Halmos, 1958). Then 

N 

Pf/ = L X, x, . (D.6h) 

l 

To prove this, let x g X . Then x v e U , so 

hi 

Xu = £ 0. i 
1=1 
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where 

a-, = *,• □ x v . 

But £, e U so oxjj=0. Thus 
a i = £,• □ x . 

Therefore 

N N N N 

Pu □ X = Xu = £ a-t = £ £,(*,- ax) = £ [(£,• *,)ox] = ( £ *i • 

i«l i=l i=l i=l 

This proves (D.5h). 

The foregoing properties of orthogonality can now be used to obtain the eigenstructure of 
F :X — » y when X and Y are Hilbert spaces, F is linear and bounded, and (D.l) holds. First con- 
sider the null space of F , defined as 

N>:«{x:x 6X and F(x) = 0}. (D.7a) 

Since F is linear, N Ip is a subspace of A'. Since F is continuous, N F is closed. Therefore 


X = N f ® N/ . (D.7b) 

In the spirit of the conventions (C.6c) and (C.8a), define 

FaX:**{Fax:xeX) (D.8a) 

and 

y □ F := {yoF : y e Y) . (D.8b) 

Then, clearly, 

FaX =F(X)zY (D.8c) 

and 

Y oF = F T (Y)zX . (D.8d) 


Since Y dF is the image of the linear function F T :Y —>X whose domain is finite-dimensional. 
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dim ( Y □F)<«> (Halmos, 1958). Define 

M := dim (Y dF) . (D.8e) 

The identity 
(yoF)ox = y □(Fdx) 

holds for every xeX. Therefore x ± (T □ F) iff F □ x = 0, i.e., iff x e N F . It follows that 
N F =(y D F) 1 , (D.8f) 

and, by (D.2a) 

N/ = (T dF) c . 

But, being finite-dimensional, Y nF is closed, so (Y nF) c =Y nF, and hence 
y □ F = N/K (D.8g) 

Then (D.7b) can be written 

X = © (Y d F) . (D.8h) 

Equation (D.8h) resolves X into the two pieces and Y dF. The function F \ Np is 
trivial, simply 0, so to study F it suffices to study the function 

F := F | y □ F . (D.9a) 

The function F has two important properties. Firstly, 

F (Y dF) =F(X) . (D.9b) 

To see this, take xeX and, using (D.7h), write 

X = Xj!- DF + Xy oF . 

Since Xy oF € N F , therefore F(xy DF ) = 0, so F(x) = F(x f oF ) = F(x F oF ). This proves (D.9b). The 
second important property of 

F : Y aF->F(X) (D.9c) 
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is that it is an injection. For suppose x € Y □ F and F (x) = 0. Then F (x) = 0, so x e N f . Then by 
(D.8f), x e (Y oF) 1 as well as Y oF, so x = 0. Since F is linear, it is injective. But this fact and 
(D.9b) show that F is a bijcction, and thus has an inverse, 

F~ l :FQC)-*Y oF. (D.9d) 

Now consider the function F T o F :Y nF —*Y aF. This function is symmetric, for 
(F t o F) t =F t o (F t ) T -F t o F . It is also positive definite, for if xe Y oF and x*0 then 
F(x)* 0, so 0 < || F □ x|| 2 = (Fnx)o(F dx) = (xoF r )o(Fox) = xd(F t dF)dx. The domain 
Y dF of the symmetric, positive definite function F T oF has finite dimension M. Therefore, 
counting repetitions of multiple eigenvalues, F T oF has M positive real eigenvalues (Halmos, 
1958). These can be ordered as 

0i 2 ></> 2 2 > • • • Z<p%, >0. (D.lOa) 

Furthermore, Y oF has an orthonormal basis {£, £ w } consisting of eigenvectors of F r oF 

with the eigenvalues (D.lOa). That is 

F t o F(Z i )=<p?x i (D.lOb) 

for i=l, ...,M . This last equation can also be written 
F r oF = 0 , 2 X,' . 

Then, by (D.2), 

□ F T □ F □ x,- = 0 2 Sij . 

Therefore, by (C.8a), 

(F oHj) □ (F ox,-) =0 ? Sij . (D.lOc) 

Next, define 

5- :=0r 1 F(x J ) . (D.lla) 

By (D.lOc), is an orthonormal subset of Y oF. Since dim Y d¥=M, {yj,...,^} >s 

an orthonormal basis for Y dF. Morcoever, from the definition (D.lla), 
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F(W=<h$i ( D - llb > 

for i = 1 . An immediate consequence of (D.l lb) is that in Y ® (Y dF) 

F=£M, *,' • < D - Uc ) 

1*1 

To see this, note that 

i-i 


and that 

F = F □Iy D F = I (Fd *,-)*/ = L • 

i=t «=l 

Now (D.l lc) follows from (D.l lb). A useful consequence of (D.l lc) is that 
i=i 

To prove (D.lle), denote its right-hand side by G, and observe from (D.lld) and (D.llc) that 
G dF = FnCr=Iy D p. 


Finally, from (D.8h) and (D.llc) it will be shown that 

F =£*,■?,•*« . ( D - 12a ) 

i=t 

J where now the right side of (D.12a) is interpreted as a tensor in Y ®X. Equation (D.12a) exhi- 
bits the eigenstructure of F, its singular value decomposition. The positive real numbers 
0 j are the eigenfactors or singular values of F , {xj, .... x M } is an eigenbasis for F , and 
{$i> — > Sm ) is 1112 corresponding cobasis. 

To prove (D.12a), let G denote its right-hand side. It will suffice to show that F □ x = G □ x 
for every xe X. But if x e X then 

M M 

F DX = F(x) = F(Xy D p) =F(Xy d f) = F OXy n F = ( £ $ i Si ^i) D X ’l' oF = S/=l 0/ $ i&i DX 1’oF) • 

i-1 
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Since £,■ e Y oF, x, oxy dF =0, so $,• oxy oF = £,• nx. Therefore 
M M 

F DX = £ <t>i Si (x,- DX) = ( X ft ft) DX . 

1=1 1=1 

This completes the proof of (D.12a). 

If N f = {0}, so Y oF=X, then F =F, and (D.lle) will hold forF as well as F. In general, 
N F * {0}, and then F -1 does not exist. When N F * {0}, F is said to have the eigcnfactor or 
singular value 0 as well as the positive eigenfactors <p\ - ' -4>m > ®- Even when N F * {0}, 

F t :Y ->X exists, and it follows immediately from (D.12a), (C.7d) and (C.l lb) that 

F T = Z<p>*i$i. (D.12b) 

/-I 

Thus F and F r have the same eigenfactors, and the eigenbasis for either is the cobasis for the 
other. 

The foregoing discussion can be carried out verbatim when D =dim Y =°° as long as 
F :X -» Y is compact (Kato, 1976). The number M of nonzero eigenfactors will always obey 

M < (dimX) a (dim Y) . (D.13) 

This point is of only academic interest in practical problems, because they never supply infinitely 
many data. 

The eigenstructure theorem (D.12a) can be applied either to the whole data function 
F :X — > T on the infinite-dimensional model space X or its restriction F^y.X^^Y to any 
- N -dimensional subspace X^) of X. The calculation of resolution in section 7 will be formally 
the same in either case. However, if a subspace can be found which permits tight bounds 
on the truncation errors (6.7) and also satisfies N «D, then section 7 should be applied to F( 0iA >) 
rather than F. Truncation will produce only a small loss of accuracy, and will permit the very 
large computational economy consequent on manipulating N xN rather than D xD matrices. 
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Table 1 


p 

v(p) 

icr 2 

2.58 

10" 3 

3.29 

1CT 4 

3.89 

10- 5 

4.42 

KT 6 

4.89 

io- 7 

5.33 


Half-length v(p) of the symmetric confidence interval with failure rate p for a one- 
dimensional Gaussian with mean 0 and variance 1. 



m t 


Table 2 


/ 

(Pr(a) 2 )% 

\Kf n (p)\ 

!*f,](p)l 

1 

107.77 

.07 

.009 

2 

22.59 

.11 

.015 

3 

23.01 

.19 

.024 

4 

17.86 

.32 

.042 

5 

11.96 

.57 

.075 

6 

9.36 

1.03 

.135 

7 

7.93 

1.87 

.25 

8 

4.73 

3.42 

.45 

9 

6.78 

6.30 

.83 

10 

4.59 

11.67 

1.53 

11 

4.34 

21.69 

2.85 

12 

3.62 

40.5 

5.32 


Confidence sets for the Gauss coefficients when the quadratic bound is the heat flow bound. 
Column 2 gives the value in microTeslas of the rms of the Gauss coefficients p ) of degree / 
at the CMB, calculated from Langel and Estes, 1982. Columns 3 and 4 give the half-lengths of 
the confidence intervals for P P(a ) if the crustal error is systematic, or random and uncorrelated. 
The failure ratep is 1CT 4 . Using p = 10 -2 would leave column 3 unaffected and would decrease 
the entries in column 4 by about ten percent. 



